Data lineage analysis is the applied understanding of how data flows throughout your data environment. Applications include understanding where errors came from, the impact a process change had or will have on data flow and use, and how to most efficiently migrate data to a new system.
It is one thing to see and understand data flow. It is another to apply that understanding to practical issues that your organization faces. Data lineage analysis is that essential stage where the theoretical becomes practical.
Data lineage and root cause analysis
When the data flow goes awry, causing reporting errors, integration errors or any other type of data error, tracking down the source of the error becomes top priority.
Root cause analysis is a backward-looking procedure, where you start from the problematic data point under scrutiny and trace backward until you find the root of the problem.
Data lineage visualizations are key tools in root cause analysis, enabling you to easily follow the erroneous data’s backward path through your systems and pinpoint where something went wrong. Constructing data lineage manually for even one data point can take days, so automated data lineage is critical to performing fast, efficient root cause analyses with high data lineage standards.
Data lineage and impact analysis
In contrast to the backward-looking root cause analysis, impact analysis is a forward-looking procedure intended to predict the likely impact of changes in data processes. Is a field slated to be deprecated? Are you changing service providers? Is a major data owner leaving the company? Knowing what business or organizational impact these changes will have lets you be proactive in preparing for the changes wisely.
Data lineage visualizations are key tools in impact analysis, enabling you to look closely at the flow of the data in question, follow it as it cascades through your data landscape, and identify what systems and roles you need to address prior to the change to preempt any problems.
Creating effective data lineage analysis reports
An effective data lineage analysis report is specific, detailed and visual. Whether it is a data lineage root cause analysis report or a data lineage impact analysis report, it should be immediately evident to the viewer where the problematic areas lie and the direct flow of the data to those areas from the point of initial inquiry.
Effective data lineage analysis reports should not only show the problems, but also give suggested fixes for those problems. Short-term, immediate fixes should be included, but if they will not obviate the problem or prevent it from occurring again, then it is important to include long-term fixes as well.