Using Data Lineage in a Hybrid BI Environment

If only fairy godmothers came around offering to turn our BI legacy systems into modern, cloud-based ones with the touch of a wand and a “Bibbidi-bobbidi-boo!”


Alas, it’s more likely that you’ll turn into a pumpkin.


BI modernization is a bumpy road. Even once your company decides to move in that direction, your legacy systems can’t just magically disappear or migrate themselves. Change is usually incremental: migrating all that data is a slow process, not done all at once so you inevitably have your foot in the old system and the new one at the same time – for a loooooong time.


On top of the issue of migration, more often than not, some of your BI systems are cloud-based and some are on-prem, and this leaves you working in a hybrid BI environment: a conglomerate of legacy and modernized, proprietary, and SaaS, often from multiple vendors.


Data and metadata are the glue that connects all these different hybrid system components. But tracking data through this type of environment can get rather… sticky. 


Where’s that fairy godmother when you need her?

The Magic of Data Lineage


The Magic of Data Lineage

Okay, so data lineage won’t turn a pumpkin into a carriage or straw into gold, or any of those other magical transformations. 


But you don’t need that.


You need clarity. You need to know exactly where your data originated, where it exists today, any changes it has undergone, and what it’s connected to.


You need the big picture to appear on command: “Magic mirror in the data – give me answers NOW, not later!”


(And if you’re about to complain that that doesn’t rhyme, just try saying it with a Brooklyn accent.)


The magic mirror of data lineage is the key to efficient, resilient hybrid BI environments, especially in (but not limited to) the following four key areas: 

  • BI system migration
  • Regulatory compliance 
  • Changes and Impact analysis
  • Fixing reports


Let’s take a look at each one.

Dive Deeper into How Advanced Lineage Can Improve Your Hybrid BI Environment
Check out our webinar for more tips and insights
Watch the Webinar


Migrating Data to the Cloud

Moving data to the cloud brings with it an opportunity – and a challenge – for increased data integrity. You do not want to lose what you should keep, i.e. all your important data.  At the same time, you do not want to keep what you should lose, like duplicate or unused data.


To succeed, you need clarity about every piece of data: where did it come from? Where is it being used? Is it okay to get rid of it?


It’s kind of like packing up a family with ten kids for a big cross-country move. You want to weed out toys that no one has touched in five years; while at the same time making sure not to throw out Billy or Jane’s most treasured possessions.


What would make this process even easier and more successful? If all items in your home are labeled with their owner’s name. Then the job of tracing each item back to its source, assessing its current usage, and gauging the impact of tossing it in the garbage just got 100 times more efficient. 


Automated data lineage does the exact same thing as you prepare and move data from an on-prem system to a cloud-based one.


A complete understanding of where your data came from, its current usage, and the benefit – or detriment – of getting rid of it turns a potentially chaotic migration into a spring cleaning for your data. 


Maintaining Regulatory Compliance and Data Governance

Who needs to be assured that you have proper data management? 


Just about everybody who is connected in any way to your data.


Regulatory bodies are concerned with your horizontal data lineage: how data entered your system, where it goes, and if what’s showing in the report is accurate. 


Business end users are concerned with your vertical data lineage: what permutations did this piece of data go through before it ended up in a report? Unless they can trust that the data represents the full truth, there’s no point in drawing conclusions from it.


In a hybrid BI environment, pulling together varied data sources, varied ownership levels, and forcing data to go through varied hoops in varied applications, makes data governance tricky. In a 2020 report from TDWI (Transforming Data with Intelligence), only 9% of companies were very satisfied with their governance and tracking data lineage across a hybrid environment of on-premise and cloud-based BI/DW systems. 42% said they could use a major upgrade.  


The key to improve hybrid BI environments through efficient data management is a centralized metadata management system. Here, you are able to see an overview of your data’s journey, from source to target. This helps companies maintain compliance by safeguarding the validity of the data. In other words, data lineage is the Rosetta Stone in effectively creating that system, giving you the what, when, where, who, how, and why of your data. It can show you the most accurate picture of your data in any given point in time.


Additionally, once you have a centralized metadata management system in place, data lineage provides an audit-ready environment in which companies can confidently demonstrate compliance. 


Performing Impact Analysis

The variation in a hybrid BI environment makes it ever more likely that a (seemingly) innocuous change to part of the system will bring frenzied knocks on your door. “Why are my numbers all wrong?!” “Why can’t I access the figures I need anymore?!” “What did you DO??!!”

Performing Impact Analysis


To avoid situations like this, data lineage should be the prerequisite to impact analysis. It gives you visibility as to what will happen to data within your ETL, Database and Reporting systems if, for example, you make a change in your source system.


With that level of insight and foresight, you can make wise decisions and prevent the rest of your company’s staff from losing their cool outside your office door.


Conducting Root Cause Analysis to Fix Reports 

No one is perfect. Mistakes happen. Inconsistencies are part of life. 


But when the responsibility for rectifying the mistakes and solving the inconsistencies in BI reporting is yours, an innocent blooper can toss you into the pressure cooker.

Conducting Root Cause Analysis to Fix Reports


Hybrid BI environments increase the variation in the way data is loaded, transformed, transmitted, and analyzed. This escalates the likelihood of inconsistent and erroneous data.


Data lineage is critical in resolving those disputes. Only when you see all the steps the data took, can you identify the misstep.


With root cause analysis, you can reverse engineer the issue. You are able to see each ETL and database that is populating the data lineage. You can then quickly see the root of the error to better understand how to rectify the situation. This is especially important if you are working in a hybrid BI environment, since there are many other systems at play. 


The Magic in Automated Data Lineage

While data lineage *can* be done manually, it typically takes days or weeks to complete the mapping required to perform root cause analysis – and even longer for impact analysis. In a hybrid BI environment where data tends to be more scattered, count on any manual lineage process taking even longer and being more susceptible to error.   


In contrast, an automated data lineage tool allows you to perform these tasks in a matter of minutes. The impact of a hybrid BI environment, even with scattered hybrid data management, is barely noticeable when you have automated data lineage in your toolbox.   


By utilizing automated data lineage, BI teams can quickly understand everything there is to know about their data, no matter where it may be. Less time spent on data lineage means more time for analysis, strategy, and all the truly intelligent parts of business intelligence.


Plus, you’ll be able to leave the office before you turn into a pumpkin.

The Magic in Automated Data Lineage


Can anyone say “BI-bbidi-bobbidi-boo”?

Tired of manually mapping your data in a hybrid BI environment?
With Octopai, you can have end-to-end data lineage in a matter of seconds across your entire BI landscape.
See How We Do It

Is your organization Octopied?

With effortless onboarding and no implementation costs, Octopai’s data intelligence platform gives you unprecedented visibility and trust into the most complex data environments.

Announcement ! We are happy to share that Octopai has been acquired by Cloudera