“At the roundabout, take the second exit. In 50 meters, turn right. You have reached your destination.”
Ahh, Waze. How did we find our way before you?
While 90%+ of Waze use is done in the zoomed-in, hyper-localized, directive-by-directive view, imagine how frustrating it would be if you wanted to zoom out and get a bigger picture of your journey… and Waze didn’t let you.
Zoom In, Zoom Out
For most data analysts and end-users, the vast majority of data activity is spent in the zoomed-in view of their data environment: analytics and reporting.
Power BI is a robust data analytics and reporting platform, with the ability to draw in data from a variety of sources, then model, visualize and derive insights.
One of Power BI’s strong points is its dataset creation capability: slices of data relevant to particular business departments or audiences, that can then be shared and used by anyone who needs that particular combination of data assets or views.
A data analyst using Power BI doesn’t need to know where the data in her dataset originated. She doesn’t have to identify the source systems, dig into the ETL or rummage through data warehouses. This saves plenty of time and energy and makes her more efficient at her job.
After all, a specialty bread baker would generally prefer to go to a marketplace and buy the flour that he needs to make a spectacular loaf of bread. Unless he has aspirations to be the Little Red Hen, he has no interest in planting the wheat, harvesting the wheat, grinding the wheat…just to get the flour.
So zoomed-in views are usually just what you need to get the job done, whether it’s driving to a destination, baking great bread or producing accurate and insightful data analysis.
Usually… unless something goes wrong.
There’s a traffic jam.
Your favorite bread flour hasn’t been in stock for the past two weeks.
The accuracy of your dataset is obviously deteriorating.
Now it’s time to zoom out, get the big picture of what’s going on and figure out what you can do to avoid or solve the problem.
Except what if you can’t zoom out? What if there are blinders or boundaries to how far you can see out of your current space?
Well then: you are STUCK.
Lineage View in Power BI: Helpful but Limited
Just like zooming out in Waze and viewing the map enables you to understand traffic flow, what’s happening where now and what options you have, data lineage enables you to see the data flow through your data environment. Where did a data asset originate? Where will it terminate? What transformations and pathways does it take along its journey?
For this reason, Power BI provides a “Lineage View” that shows high-level data lineage between artifacts within a Power BI workspace. Lineage View will show you from where data entered that workspace and to where it exited. For most users, most of the time, it may be enough.
The problems begin when you need to zoom out. Your department is considering a change to a business process. How is that going to affect your dataset? Or – your revenue visualizations are showing some strange discrepancies with what you know to be reality. Where is that data quality issue coming from?
It is here that you need to zoom out and see what parts of the dataset are feeding your reports, what database objects are fueling that dataset, what ETL processes are creating those database objects and what source systems are driving raw data to those ETL processes.
Because Power BI datasets are the basis for self-service reports for entire departments, an unresolved or unexpected issue in the dataset can affect hundreds of reports. But Power BI lineage can only zoom out so far and in so much detail, especially when it comes to connections to on-premises legacy systems.
Exactly what ETL process in your system is responsible for the inaccuracy in a specific segment of your finance department’s Power BI dataset? If your company decides to change the way you collect online payments, what impact will that have on your Power BI dataset?
The data lineage in Power BI often comes up short when it comes to answering these types of questions.
Does that mean you’re stuck?
Getting Unstuck
When Octopai originally (some time ago) developed an integration for Power BI, a primary motivator was the need to give Power BI users a clear view of Power BI’s data and impact within the big picture of the entire data environment.
In addition, the lineage needed to be implemented around Power BI’s unique virtue: the dataset. A typical Octopai data lineage view will show a flow through an ETL process, into a database and over to analytics and reporting. For Power BI, we designed a unique lineage view that shows the Power BI dataset as a separate stage between the database assets and the analytics and reporting system.
This method of tracking and displaying Power BI data lineage makes it much simpler for data teams who need to investigate a data quality issue within a Power BI dataset. It makes it easier to predict the impact of a business process change on a dataset, and realistic to take precautions to enable the change to go without any negative impact on your datasets or reports..
The Best of Both Worlds
The dataset creation capability of Power BI enables analysts and business users to use data from legacy systems effectively, in essence because it allows them to bypass dealing with the legacy system and just get the data into Power BI where they can do what they want with it. But that bypass is a Catch-22: as soon as you have a need to understand more about the data’s connection to the legacy system, that’s bad news.
Giving data teams a way out of this Catch-22 was Octopai’s goal in its integration of Power BI.
Most Power BI analysts and end-users are perfectly happy to use the zoomed-in view, focusing on a Power BI dataset and how they can use it to generate the business insight they need.
But when you hit an unexpected obstacle, being able to zoom out and understand exactly how your Power BI dataset fits into your big data picture may be what enables you to reach your destination.