The modern data stack is a data management system built out of cloud-based data systems. A given modern data stack will usually include components for data ingestion from your data sources, data transformation, data storage, data analysis and reporting.
In the ideal modern data stack, it should be easy to connect these components and no big deal to switch one component out for another. So the modern data stack provides the advantages of:
- Flexibility
- Ability to choose and use best-in-class technology
- Accessibility
- Speed (processing power is much faster in cloud-based systems)
- Freedom (no vendor lock-in)
But what goes up must come down, and what has upsides must have downsides. Where do modern data stacks fall short?
The primary advantage of modern data stacks – that they’re made out of different interchangeable tools or services – is also its weakness. Modern data stacks face challenges when it comes to:
- Smooth functioning – when you have a system made out of disparate pieces, someone needs to connect them and make sure they stay connected
- Observability – what’s going on inside my stack? What happened (or went wrong) when data moved from this piece to that piece?
- Governance and security – unified standards need a centralized system
- Finding the address for help – when a user has a question about a data resource or process, where do they go? How easy is it to find the person or team responsible for the management of any individual tool, process or dataset?
Users of modern data stacks could use some X-ray vision to see inside the different components of the data stack and shed some light on its pieces, processes and the data flowing through it.
Fortunately, that X-ray vision exists. It’s called enterprise data lineage.
What enterprise data lineage lets you see
Data lineage is the ability to view the path of data as it flows from source to target within your data ecosystem, along with everything that happened to it along the way.
If you want to know why a report from Power BI delivered a particular number, data lineage traces that data point back through your data warehouse or lakehouse, back through your data integration tool, back to where the data basis for that report metric first entered your system. And data lineage solutions will also show you any transformations the data underwent on its journey.
Data lineage not only lets you see into your data’s past, but into its future. What if you change a business process that affects the way you collect data for a certain table? How many downstream reports will show errors? How many downstream users will be ticked off? How can you do what you need to do without disrupting the enterprise’s smooth functioning? As a core tool for impact analysis, data lineage shows you the ripples your data process change will cause downstream. The view from data lineage enables you to prepare in advance for the change by modifying pipelines or notifying users who will be affected.
Particularly relevant for modern data stacks is switching out components of the stack for others that are deemed better for what your enterprise needs today – i.e. migrating data systems. Data lineage gives you the ability to make sure those migrations go smoothly, with all data continuing to go where it needs to go, when it needs to go there.
Choosing a data lineage solution for the modern data stack
There is a host of general things to look out for when implementing data lineage in any environment, but here are some specific criteria for lineage for modern data stacks:
Integrates with a wide range of cloud-based technologies
Since the flexibility to change technologies and tools is one of the benefits of a modern data stack, you want your enterprise data lineage solution to support both the tools you use now, and any tools you may decide to use in the future. And you want these data lineage integrations to be out-of-the-box, plug-and-play – without the need for developer resources.
A good sign is if you see that the data lineage solution providers are continually adding new integrations for new cloud-based technologies. That means that keeping up-to-date is a priority for them. So if a new technology bursts onto the market in another year and your enterprise decides a few months later to use it, it’s likely that this data lineage solution will have developed an integration by then.
And if the solution doesn’t support all of your technologies out-of-the-box, there should still be an option to enrich the automated lineage with additional relationships between data assets to make sure to cover all of your ‘homegrown’ applications and processes as well.
User-friendly interface
Most users of modern data stacks have data democratization and self-serve data access on their priority lists. If you want even your business users to be able to take more advantage of the insights offered by your business data lineage solution (and take less advantage of your data team’s time), choose a data lineage solution that is easy and intuitive to use.
As automated as possible
Speed to insight is a prime motivation for moving to cloud-based technologies. The components that make up a modern data stack are usually characterized by a high level of automation that leave humans free to do the truly intelligent parts of business intelligence. Your data lineage tool should be no different. Automated data lineage uses analysis of your system’s metadata to trace your data’s path, without having to burden your data team with manual work.
Data lineage that integrates with a data catalog
As mentioned above, finding the address for help with a data resource is a particular challenge with the modular nature of the modern data stack. The team or individual in charge of your integrations tool may not be the same as those in charge of your data warehouse. And with the steady rise in remote work that cloud-based technology facilitates, you can’t just go down to the IT or data engineering department and ask.
For this reason, having a data catalog that imparts order to your entire data landscape, with comprehensive data asset entries featuring definitions, data owner and steward, user ratings and reviews, security status and more is indispensable. And when your data catalog solution contains or integrates with data lineage, this data catalog lineage gives you a one-stop-shop to get answers about your data.
Modern data lineage for a modern data stack
Both “legacy” and “modern” have their place in data systems. Both have their upsides and their downsides. For either one – but especially for the fast-moving, multi-technology, modern data stack – being able to see what’s going on under the hood is critical. Seeing is knowledge, and knowledge is power.
Power to you!