What Is a Data Catalog Platform and Do You Need One?

What Is a Data Catalog Platform

There are two ways to build an impressive structure. 

This is one way:

What Is a Data Catalog Platform

This is another way:

Painting on the wall of the tomb of Rekhmire, the Egyptian vizier in the mid-15th century B.C.E.
Painting on the wall of the tomb of Rekhmire, the Egyptian vizier in the mid-15th century B.C.E.


Which way do you prefer?


Data under construction

In the corporate world, we rarely find ourselves building things with stone, bricks, or even steel and concrete. Today, we build with data. 


With data, we construct analyses of what happened and predictions of what will happen. With data, we create satisfying customer experiences. With data, we build the revenue and profit of our company. 


But just like any other kind of construction, there are multiple ways to go about it. You could produce a stunning data analysis and spectacular report solely on the basis of manual labor: combing through column after column, table after table. It just might be outdated by the time you actually finished it. And you wouldn’t be able to do anything else in the interim: not exactly ideal optimization of human resources. 

This Is Stupid Basketball Wives GIF by VH1 - Find & Share on GIPHY


Modern construction calls for modern tools. And modern data construction calls for modern data tools, one of which is a data catalog platform.


What is a data catalog platform, again?

A data catalog platform organizes all the data assets in a company’s information landscape on the basis of the data’s metadata (i.e. the data about the data). It gives any data asset an intelligent context from which to understand and use it. 


Each data asset’s entry in the metadata catalog includes definitions, descriptions, ratings, data owner and steward, and more, making it simple to search for, identify and evaluate the data you need for any given purpose. 


A small business might not need tools for data catalog software; if you have very limited data assets, the cost of organizing and automating your metadata and data catalog management might not be worth it. Some things you can build by hand: contracting with a construction company to install a shelf is usually overkill.


But as soon as you are dealing with data assets in the tens or hundreds of thousands (as are most enterprises today), efficiency dictates either modern automated tools or a team of indentured servants – and the latter is not exactly realistic.


If you find yourself or your team members saying the following things, you’ll find that life and work with a data catalog platform is refreshingly simplified and astoundingly productive.

What are the 3 'Must-Haves' for Every Data Catalog?
Find out in our latest eBook
Download the eBook


“I spent a long time creating a dataset (or a report)… and after I finished I found out that it existed already.”


Frustrating waste of time and energy #1 = duplicating what already exists. It’s almost inevitable when data assets aren’t organized. 


A data catalog platform serves as a single, searchable repository for all data assets. Finding what you need is step one in any data project, so powerful, user-friendly search and discovery functionality is at the heart of any data catalog worth its salt. Ideally, data discovery on a data catalog should be just as intuitive as searching for a product on an online retail marketplace.  


“I used an available dataset for an important project, only to discover that it had quality issues. I had to redo the whole project.”


Frustrating waste of time and energy #2 = unknowingly using bad quality materials, rendering the finished product useless. How were you supposed to check the quality? It all looks the same… until it falls apart.

Collapse GIF - Find & Share on GIPHY


Data catalog features should include usage data in entries, so you can see how often and for what this data asset has been used. They should also include user-generated information like ratings and reviews, enabling you to leverage the combined experience of your organization’s data consumers when choosing what data you’re going to rely on.


“Whenever I have a question about a data asset, it takes forever to get in touch with someone who can give me an answer.”


Who is the data owner? Who is the data steward? Is there a subject matter expert in the house – or did she jump ship a month ago without a forwarding address?


A good data catalog platform clearly identifies the important roles for every data asset. A really good data catalog provides communication channels from within the catalog itself for getting answers and clarifications from those responsible for the data asset. A really, really good data catalog records these questions and conversations within the data asset entry, making this valuable tribal knowledge available to every future user who looks at the entry.


“It takes me SO long to find the right data for my project.”


Without a data catalog, getting the data you need for a project is reminiscent of a poor blind dating experience. Based on whatever documentation exists, plus some talking to colleagues, you’re set up with a data asset you think has promise. You psyche yourself up, make yourself look all snazzy, go to the appointed place, spend a few hours with the date, and… nah. Not a chance. How did anyone even think this would go?

You Are Wrong Fangirl GIF by Temple Of Geek - Find & Share on GIPHY


It would be SO much more effective – not to mention time- and energy-saving – for you to be able to learn more about your date before you meet them. What do they look like? What are their interests? What are their life goals? What do their friends – and former dates! – say about them?


A data catalog removes the blindness from your data matchmaking experience. Like the perfect online dating site (which doesn’t exist, but we can dream), you can search for what you think would make a good match, and then see your potential dates in the illuminating context of objective information about them (e.g. what accomplishments have they had, where do they usually go on dates, how many people have gone out with them) and subjective information (e.g. what prior relationships say about them). You can even ask questions to people who know them!


While this doesn’t guarantee success (you do have to meet them before you can commit a long-term relationship), your chances of finding Mr./Ms. Right Data go way up and the time required to do so goes way down. 


Many hands (or the right tools) make light work

If you were Egyptian Vizier Rekhmire planning the impressive structure that would house your future mummified body, you might have had the luxury of “many hands.” Not so if you’re an enterprise business user, BI self-service user, or BI analyst. All you have are your own two hands and brain – and the brains of any relevant data users or experts you can get your hands on.


But if you also have a data catalog platform, it doesn’t matter. The right power tools trump manual labor any day. Spectacular data construction, here you come!

Create company-wide consistency with a self-creating, self-updating data catalog
See for yourself how Octopai's automated data catalog, with built-in collaboration and integrated data lineage will change the way your data users work
Schedule a Personalized Demo

Is your organization Octopied?

With effortless onboarding and no implementation costs, Octopai’s data intelligence platform gives you unprecedented visibility and trust into the most complex data environments.

Create company-wide consistency with a self-creating, self-updating data catalog
See for yourself how Octopai's automated data catalog, with built-in collaboration and integrated data lineage will change the way your data users work
Schedule a Personalized Demo

Announcement ! We are happy to share that Octopai has been acquired by Cloudera