The Step By Step Guide on How To Implement Data Lineage

How To Implement Data Lineage

Data lineage by any other name… would still be data lineage.

If you are the one who answers the “where did this number come from?” and “what does this number mean?” questions at your company, you are invariably doing or in need of doing some type of data lineage.


For every BI team, however, there comes a point when it makes sense to be more systematic about your data lineage. It is time to implement a data lineage solution.


So, what do you need in order to do that? Let’s take it step by step:


Step 1: Get your priorities (and use cases) straight


The first thing you need in order to
implement data lineage systematically is to clearly establish how you currently use or plan to use data lineage.


Here are some possibilities (and for more expanded ideas, feel free to check out our
data lineage use cases): 


Impact analysis

Look before you leap. Haste makes waste. A stitch in time saves nine. An ounce of prevention is worth a pound of cure.


The folks who originally coined these sayings probably couldn’t have foreseen business intelligence … but nowhere are these words more true.


If you
make a change to a process – without checking the impact first, it could have pretty terrible consequences. This will leave you working overtime to fix it (not to mention appeasing irate business users). You want to have a clear idea of what the impact of a change will be before you make the change. 

Data lineage traces the flow of data through your BI environment and processes, enabling you to see what kind of impact a change to one part of the system will have on the other parts. 

Root cause analysis

This is the issue that causes your team to work overtime and lose sleep at night. This same issue inspires frenzied “where did this number come from?” calls from business users.


When there is a mistake in a report, it has to come from somewhere. Your job as BI Manager is to figure out how and where the data was mishandled or misinterpreted.


Data lineage maps out the data trail. Pick any data point (like your questionable report metric), and trace it back upstream to see exactly what it was based on, what processes were involved, and
where things may have gone wrong


Explainability

Especially for financial institutions, explainability is vital. Why did you valuate your credit card portfolio at 14.6 million? Why did you valuate your real estate holdings at 3.1 million?


Data lineage enables you to not only get the numbers right, but to easily show how you got the numbers right.

Can Your Data Lineage Do All This?
See what Data Lineage XD can do with 3 different types of lineage
Learn More About XD


Regulatory compliance

Depending on the industry, your business may be filled with lots of little letters that can cause a big headache: GDPR, CCPA, HIPAA, IFRS…  the list goes on. If you don’t want to spend all your BI hours trying not to drown in the alphabet soup, you need a way to efficiently access, navigate and manage your information. You also need a way to demonstrate to compliance auditors that you have fulfilled the requirements.

Data lineage in data governance is your key to fulfilling the regulatory standards for your industry – and to prove your compliance. For example,  when a client asks you to delete all his PII under the GDPR, data lineage tools make it simple for you to locate every instance of his PII. This ensures that not only are you sending them on a one-way trip to nowhere, you can also prove that you did so. 


Business insights

Business intelligence isn’t supposed to be reactive only. It is ideally supposed to be proactive: analyzing trends and directing the business on where to go to maximize profits. But to make informed, data-based decisions, you need data you can trust – and a way to understand it.


Data lineage solutions can pinpoint the source of data inconsistencies. Used correctly, they can eliminate dirty data and provide your business with a single source of truth. When your BI team is confident in your data, then the business will feel secure in your insights.

Step 2: Get buy-in from management

Once you’ve clarified the ways in which you will utilize data lineage, it’s time to go to management and get the green light for data lineage solutions. Go prepared – for each use case, explain the role a data lineage solution will play and how implementing it will save the company time and money.


Bringing real-world examples of how data lineage techniques support the objectives of companies similar to yours is often a critical component in getting a “yes.” Feel free to use any of our
data lineage case studies if it helps you make the point.

Step 3: Research data lineage solutions

Once you have the go-ahead from management, it’s time to find the right solution for your company. Let your priorities and use cases (see Step 1) be your first measuring stick with which to filter out the available tools. If your primary use case is answering the questions of business users and auditors, the speed of the tool may be more important to you. If your primary use case was deriving insights for planning future business strategy, look into a tool that specializes in this.


In addition, it is wise to consider the following factors:

What technologies does it support?

At the risk of stating the obvious: your data lineage solution will need to support your company’s data systems.

Do you use Oracle? Cognos? Informatica? Azure Data Factory? Snowflake? Or some awkward proprietary system cobbled together over the course of the last 15 years?

Ideally, you want a data lineage solution that will support every single data technology tool you use, providing full visibility end-to-end. You don’t want black boxes in the middle of your data lineage charts.


If you are coping with the awkward proprietary system, you may be forced to find a tech agnostic solution. What those provide in tech flexibility, however, they lose in accuracy. (Yet another reason to schedule  your next “why we need to migrate to a standard BI platform” meeting with management.)


In which BI environment is it designed to best function?

What does your BI environment look like? Are you working with a single vendor? Do you integrate systems from multiple vendors? Are your systems on-prem, cloud-based or a type of hybrid configuration?


Your answer will inform the level of data lineage tool that you need. If you use only one on-premises legacy system, you may only need a simple data lineage tool. On the other hand, if your data passes through multiple systems while flying to the cloud and back, you need data lineage that can follow your data wherever it goes.


What dimensions of data lineage does it show?

Data lineage is often divided into horizontal vs. vertical data lineage. Horizontal data lineage is concerned with the systems involved in the journey from source to destination. Horizontal lineage is what enables you to answer questions like:

  • Where did the data come from?
  • What stops did the data make on its journey (MRR, Stage, DWH)?
  • Which reports, visualizations or analyses use the data?

Vertical data lineage is all about the transformations that occur along the way: the minutiae of the various ETL processes a data field encounters on its journey. Vertical ETL data lineage is what enables you to answer questions like:

  • What is the source of an attribute in my report?
  • How was this KPI calculated?
  • Why do these two “identical” fields have different values?


At Octopai, we provide our users with
three different dimensions of data lineage that encompass horizontal and vertical while going even deeper into the data.

Cross-system lineage provides high-level, end-to-end lineage at the system level from the entry point into the BI landscape, all the way to reporting and analytics. 

End-to-end column lineage enables you to “zoom in” on the cross-system lineage, detailing column to column-level lineage between systems from across the BI landscape.

Inner-system lineage details the column-level lineage within an ETL process, report or database object.


Think about what dimensions of BI you need to access and compare it to the functionality of the data lineage system that you’re evaluating. 


How user-friendly is it?

If your data lineage tool is meant for BI’s eyes only, you’d still want it to be somewhat intuitive. (Even geeky types can be intimidated by raw intellect.)


If business users are going to have direct access to the data lineage tool, then you
definitely want it to be intuitive. How clearly presented is the information? How hard is it to find answers? Does it require extensive training to use and understand?


In addition, how long does setup take? Is implementing a data lineage solution going to occupy days, weeks, or months? Octopai only takes up to one hour of client time to set up and takes up to 48 hours to receive access to all the data lineage you’ve ever dreamed of. When you can set Octopai up during a Friday lunch break and come into the office on Monday to see complete data lineage ready and waiting for you…

Step 4: Choose and implement a solution

This is where it gets exciting! (Or tricky, if your BI decision-makers have a difference of opinion.)


But once you get over that hurdle, and implement the right data lineage process for you, then the fun really starts.

Answers. Clarity. Accuracy. Confidence. Foresight.  Ahhh… data lineage, how we have longed for you.

View the Data Flow from Every Angle with Data Lineage XD
The most complete and trustworthy data lineage available today
Schedule a Personalized Demo

Is your organization Octopied?

With effortless onboarding and no implementation costs, Octopai’s data intelligence platform gives you unprecedented visibility and trust into the most complex data environments.

Announcement ! We are happy to share that Octopai has been acquired by Cloudera