What is Automated Data Discovery?

Put on your virtual safari helmet and sharpen your navigation skills. We’re going on an expedition of discovery.


This is no ordinary, physical journey. We’re exploring the true heart of darkness: the uncharted territory of your data landscape. What we will discover can be both frightening and enlightening, often at the same time.


The journey will not be easy. Imagine being parachuted into the middle of a jungle, at night, and instructed to draw a detailed map of the entire continent by the end of the week. You don’t even know which continent you’re on, much less where you are on it, or even what direction you’re facing.


Especially for large organizations with complex data environments, that’s what it can be like on your journey of data discovery.


Data Discovery, Defined

Let’s face it: If your organization is like many enterprises these days, you’re drowning in data. There’s so much of it that you can’t possibly visualize it all, even in summarized form. You’re constantly at risk of missing important connections, insights, and relationships within your data that could drive innovation, inspire new business models, and inform both strategic and tactical decisions.


That’s the problem that data discovery can solve. Although there’s no single, agreed-on standard definition, in general, the data discovery process boils down to some or all of these activities:

Taking a census of your data resources: What are their names, and where do they reside?

Characterizing your data resources: Where did the data come from? What is its stated purpose? What details does it contain?

Preparing your data for analysis: What are the relationships between your data sources? What data models can be built from these relationships?

Visualizing your data landscape: By slicing and dicing the data landscape in different ways, what connections, relationships, and outliers can be found? What hidden trends can be identified?

Analyzing the data: Using statistical methods, what insights can be gained by summarizing the data? What cause-and-effect relationships can be explained? How reliable are these characterizations?


This journey of enterprise data discovery cannot be performed manually. Even if you had unlimited time, money, and BI analysts at your disposal, the landscape can shift frequently as new data resources are added, others grow and evolve, and still, others reach the end of their useful lives. The landscape is a constantly moving target.

Automated Data Discovery Makes Data Correction Easy
Check out our white paper "Find and Eliminate Data Errors" to learn how
Read the White Paper

Automated Data Discovery

What’s needed, therefore, is an automated tool to do most of the heavy lifting.


With automated tools, the process of finding and characterizing your data assets is done for you, in a fraction of the time that would be needed to do it manually. Much of the preparation and visualization tasks can be automated as well, by enabling graphical representations of the relationships between data sources.


Automated tools work by examining and cataloging metadata. Automated metadata discovery is the secret sauce by which the relationships between data sources can be found. By identifying similar columns in different databases—for example, columns that might identify a customer’s name, even though the columns have different field names—the connections between data sources are more readily identified and analyzed.


Furthermore, automated tools can execute the data discovery process over and over, thereby keeping your “map” up to date even with constant changes.


Finally, automated tools support data governance, or the rules around how data is managed and secured. Governed data discovery can satisfy both the policies of data governance and the demands of end-users for a fast, complete picture of the data environment.


Adopting automated tools for data discovery is the first step in unlocking the value hidden in your data landscape. That value manifests itself in a competitive advantage in the markets you serve. Isn’t it time for your organization to benefit from automated data discovery?

Now that you know more about automated data discovery...
it's time to learn about automated data lineage.
I Want to Learn More

Is your organization Octopied?

With effortless onboarding and no implementation costs, Octopai’s data intelligence platform gives you unprecedented visibility and trust into the most complex data environments.

Announcement ! We are happy to share that Octopai has been acquired by Cloudera