How Advanced Data Lineage Can Improve BI Team Performance

Play Video

Do your BI & Analytics group’s daily tasks usually turn into weeks-long projects? Probably because almost everything is being done manually.


Watch this technical demo to learn how transforming your manual data lineage with automation can dramatically improve BI performance. You’ll get an in-depth technical demo of Octopai’s automated data lineage and see how BI teams can turn weeks of manual work into hours, and increase productivity by up to 50% while reducing risks.

Video Transcript

Asaf Melamed: Hello, everyone. Welcome. Welcome, everyone. Let’s give another minute or so for people to join in. In the meantime, you can sit back, relax, adjust the audio on your machine, and you’re going to enjoy the show very, very soon. We’ll just give another minute for people to join in. We will start the session together. Welcome. Welcome, everyone. Okay, so I’ll give people just a few more seconds to join in, and then I’ll start. In the meantime, you can sit back and relax, and enjoy the session.

Okay, so let’s begin. Let’s start. Hello, everyone. Welcome to the session in which we should see how automated data lineage can improve the performance of BI Teams. We’re very excited to have you all here. My name is Asaf Melamed. I’m the Head of Technical Account Management at Octopai. Just a few housekeeping rules before we begin. You shall all be muted throughout the session.

However, we will have a Q&A section at the end of the session. If you wish to ask any questions along the way, feel free to enter them in the area where you have a Ask a Question section on your screen. Let me just switch off my camera, and we’ll begin. All right, let’s start with a background about ourselves.

Octopai was started a number of years ago by our founders, who came from the BI domain, leading BI groups in different industries, where they faced many challenges, such as getting complaints from the business users about data reliability, accuracy, or corrupt reports with missing data, or issues related to finding where the data is, by understanding and discovering the metadata that resides in the various BI systems.

For this, they’ve created a solution that runs today in many organizations worldwide, from different industries such as manufacturing, healthcare, insurance, retail, gaming, and financial institutions too. How do we do it? All of that metadata that is so crucial to understand and difficult to collect is collected by us and placed into a cross-platform SaaS solution. We discover the metadata automatically with no manual processes, such as documentation, preparation, customizations, or professional services needed.

The metadata is then centralized, analyzed, modeled, passed, and then it is ready for discovery. You can easily find metadata in seconds with a mouse click. Octopai reduces the time it takes to do this from many days to seconds, providing the most accurate picture of your metadata. Not only is Octopai essential in the initial setup, collection, and cataloging, and analysis of your metadata, it is also essential moving forward.

Whenever you need to look for metadata next week, month, or year, you’ll be looking at the most current picture at that given point in time, not some spreadsheet that, with all good intentions, was created a few years ago and was never updated.

This is an example of a typical BI infrastructure that we commonly see amongst our customers. On the left-hand side, we can see examples of business applications being used by organizations, and consumed by thousands or even tens of thousands of business users, which do not have direct access to the data. The BI Team is responsible for making the data available to the business users.

This is why, at any given point in time, the BI Team needs to know where the data is, and understand its moving process throughout the different BI systems. This may be due to reporting errors or impact analysis issues. Because the metadata is scattered all over the BI landscape, our customers tell us that BI groups spend more than 50% of their time just trying to discover and understand where the metadata is, in order to understand the relationships, connections, and the data lineages too.

To overcome these challenges, we’ve leveraged technology to create a solution that extracts, centralizes, analyzes, and visualizes metadata automatically from your BI systems. We’re able to do this by simply extracting the metadata from the BI tools. This is then uploaded to our cloud for analysis, and then within 24 hours, you get an in-depth picture of your BI landscape. It’s as simple as that.

Octopai’s automated BI Intelligence Platform provides the following modules. The Automated Data Discovery enables you to instantly locate your metadata across the BI systems. The Automated Data Lineage XD provides you with a complete view of your entire data journey, and the Automated Data Catalog enables you to manage your data assets across the organization.

Octopai’s Data Lineage XD provides you with three different types of lineages, where you can find everything that you need in an easy-to-use platform. Cross System Lineage provides an end-to-end lineage at the system level, from the entry point into the BI landscape, all the way downstream, to the reporting and analytics. The Inner System Lineage details the column level lineage within an ETL process report or database object.

Understanding the logic and data flow for each column provides visibility at the column level, no matter how complex the process report or object is, and End-to-End Column Lineage details column-to-column-level lineage between systems, from the entry points into the BI landscape, all the way through to the reporting and analytics.

When coming to us, most of the customers have mentioned facing the following similar challenges, building new processes or changes in existing processes, mergers and acquisitions, impact analysis when changes are being made, how do you know what will be the impact of those changes before making them, or going to production, and also, reporting errors.

In today’s session, we should be demonstrating two very common use cases which BI Teams face during the ongoing maintenance activities, such as root cause analysis and impact analysis. Let’s start with our first use case, which is root cause analysis. Let’s imagine that you’re busy working on your month and the release to production, and to only have one day to have everything ready for the release.

Suddenly, you receive a new ticket. Your sales operation manager has opened a high priority ticket regarding in the field “TotalProductCost” in the report “Customer Sales by Year”, claiming that that field isn’t accurate. Now, you think to yourself, “Oh, great. A new ticket. Just what I needed now.” You realize that you need to investigate the report structure and might need to contact the DBA to understand the exact table names and views that relate to the creation of that report, and perhaps, something changed in the ETL processes associated with these tables of the report.

I can share with you that one that my customers told us before that they used Octopai, it used to take them hours and sometimes even days to do these root cause analysis investigations, which could involve also different teams. Don’t worry. Let’s see how this can be done with Octopai quickly. You’re looking now at the lineage module of Octopai. On the left-hand side, we can see the different levels of lineages, Cross System Lineage, Inner System Lineage, and End-to-End Column Lineage, which we’re looking at now.

You can also see the other modules of Octopai such as the metadata discovery, and also the Automated Data Catalog. On the left-hand side, underneath this blue bar, we can see the ETLs that came from our systems, SQL Server, SSIS, ADF, and Snowflake. These also include the store procedures, which can also load data into the database tables, which appear under the red bar here.

In the red bar, we can see here below different database objects, such as views, store procedures, and so forth. They came from SQL Server, Snowflake and Oracle in our system. By the way, you can also see the SSAS objects, too. On the right-hand side, you can see the reporting tools underneath the green bar. In our system, they came from Power BI, Tableau, and SSIS.

Let’s start investigating the end-to-end lineage of the column TotalProductCost.

[silence] Now, you can see over here, in fact, that column appears in two reports. Customer Sales by Year is the report that we’re looking for. Let’s click on the End-to-End. Now, I will show you the entire end-to-end lineage from the report all the way back, and we’re going to travel through different database objects, ETLs, all the way to the source system. First, let’s see what we have on the screen.

On the right-hand side in the green frame is the report. Customer Sales by Year.

It came from SSRS. In the red frame, we can see the database objects, and in the blue frame, we can see the ETLs. Within the reports, we can see the target column, this is the column TotalProductCost, which the end-user has complained about. We can see that the source column in green is also TotalProductCost and that came from the DwhFactSales table and the Dwh_Sales database.

We can see that it is getting populated by the Load DWH ETL that came from SSRS over here. Now, we can see that the source column of that ETL is also TotalProductCost. To see the lineage, you just need to follow the blue line backwards going upstream. That table is in the Staging Sales database. Going ahead, we can see that that table is getting populated by two ETLs.

The first top one is Load Staging, where we can see that the column TotalProductCost originates from three columns. UnitPrice, OrderQty, and UnitPriceDiscount that came from the Mrr_Sales database. Those are getting populated by the ETL Load MRR. The columns here originate from these three columns in the SALESORDERDETAIL table, which belongs to the AdventureWorks database, which is practically our ultimate source.

As you can see, by isolating our focus only on the columns that you can see over here in the lineage, we are not seeing the hundreds of other columns within these objects that are involved in the upstream data flow. Now, from looking at this ETL over here, I suspect that here’s the culprit of our issue. Let’s dive into these objects to see more detailed information. Let’s start from the beginning.

I would like to show you now the Inner System Lineage of the Customer Sales by Year report. How? Click over here on the Inner System Lineage link. This will open up the inner system lineage of that report, showing you the report’s column-to-column relationships flow diagram, from the physical layer in green is actually the database tables, through the semantic layer in orange, where you have all the transformations and calculations, to the presentation layer, this which you can see in red over here.

The column that we’re talking about his TotalProductCost and to see the lineage of that column, just follow the blue line again all the way back, and you can see that it came from the column TotalProductCost in this report from the DwhFactSales table in the Dwh_Sales database. I can go back and continue my root cause analysis investigation. I can go to the end-to-end column lineage, and look at the end-to-end column lineage on this level, or I can also go to the Cross System Lineage by clicking on the link over here, of Cross System Lineage, which provides us with a 30,000-feet view of the lineages between systems on an object level. Let’s click on that.

What you can see on the screen are the different objects involved in the lineage. Let’s go through the different circles over here. On the right-hand side, in green, is the report. Customer Sales by Year shown from SSRS. That report is associated with these four SQL Server tables shown in red, and they are populated by these ETLs shown in blue, which in our system came from Informatica, ATF, or SSIS. Now, I can continue my root cause analysis investigation on this level.

For example, if I’d like to see the objects that are the sources of this ETL, I can click on that, click on the back arrow, and now you will see the different SQL Server tables shown in red, which populates that ETL over here, including the Staging Fact Sales table coming from SQL Server. If I’d like to see what populated that table, Staging Fact Sales, click again on that circle. Click on the left arrow again. Now, you’ll see the three ETLs populating that table.

Those ETLs came from SSIS and Informatica here. We’re going to focus on the Load Staging ETL on the left-hand side. I would like to now drill into that Load Staging ETL, and look at the inner system lineage to see more granular information. Let’s dive into that. We’re now going to go to the Load Staging ETL. Let’s click on Drill. Here, again, you can see the entire column-to-column transformations within this ETL, where we can see the source tables in green, transformations in orange, and the target tables are shown in red over here.

The column that we’re focusing on is TotalProductCost. Let’s take a look at that. In this ETL, it is in the Staging Fact Sales table within the Staging Sales database. If I’d like to see where that came from, just follow the blue line here. It went into that transformation and originates from these three columns, OrderQty, UnitPrice and UnitPriceDiscount, which we’ve seen before. Those are in the MrrSalesOrderDetail table.

I’d like to see now the specific SQL scripts of the transformation, let’s click on that. On the right-hand side, you can see the entire script within that transformation including the definition of the column, TotalProductCost. You can see that it came from this formula, OrderQty multiplied by UnitPrice multiplied by 1 minus UnitPriceDiscount. Now, I know from a very reliable source that this formula was actually recently changed.

How do I know that? Actually, I made the change, but I forgot to document that. We’re going to talk about documentation in the next use case, but what I’d like to explain and recap on what we’ve seen so far is that we’ve done the root cause analysis on three levels. We turn it on the Cross System Lineage level with the objects. We started off with the end-to-end column lineage with the different columns, and ended up in the Inner System Lineage over here.

We’re looking again at the TotalProductCost and the transformation here. I can now show you that the full end-to-end root cause analysis investigation can be performed by one person in Octopai without the need to involve additional teams to obtain this information. I can go ahead and change that formula, close the ticket, and go back to working on my month end release.

Now, let’s move ahead to the next use case, which is Impact Analysis. One day, your ERP logistic team lead comes up to you and informs you that they have already increased the size of the field “ProductCategoryName” from 50 to 100 characters and you think to yourself, “Couldn’t they have informed ahead of time?” Now, you’re faced with the fact that you need to find its location in entire BI landscape, understanding how long your project will take, and how many resources you need to allocate to it, and that might involve different teams too.

Let’s see how your life will look like with Octopai, understanding exactly where that field is found in your BI systems, so you can plan accurately for performing this change and keep your business users happy. Let’s go to the metadata discovery module of Octopai. The metadata discovery module is actually a textual documentation of your BI landscape. You can search for any object, text, or even formula.

You can actually schedule to automatically run the upload of your metadata extractions to Octopai on a regular basis. Each time that your metadata is uploaded to Octopai, we refresh your documentation of your BI landscape, so that it is always kept up-to-date. Now, we need to scope a project, as we need to make changes to a certain field ProductCategoryName. I’m going to type in the word “Product”. As you can see Octopai gives us the top suggestions, where it found that word, in the ETLs, databases, analysis tools or reports. When I click on Enter, Octopai starts to search for that word within the different systems that we have here in Octopai.

You can see in green that we have searched the different entities, and we are showing you the number of times where we have found the word “Product”. For example, product can be found 13 times in the table sources of Informatica, or over here below, 97 times in the columns of Snowflake. The gray boxes represent the objects that we’ve searched, but we have not found the word “Product” in them.

We recommend that you don’t waste your time looking for the word “Product” in the constraints here, focus on the green objects, where it is found. You can actually dive deeper into anyone of these boxes to see additional information. Let’s start with the SSAS packages, these are the ETLs. Let’s take a look at them. What we see over here is this. In this table format, we give you the ETL name such as Load DWH.

We give you the database name, schema, the table name, and also the column name which is found over here. Specifically, there is ProductCategoryName, which we need to change. I can go ahead and filter all the information here. I can also filter the columns and change the different filters like Contains where it starts with ends with. I can export everything to Excel, so that you can actually collaborate between your group members. You can export this to Excel, and share the information within your organization.

Actually, everything is fully exportable from Octopai including the lineages, the visualizations that you’ve seen previously. To see the visualization of the ETL load DWH, I can click on the SSIS Map here on the right-hand-side. This will bring us to the inner system lineage of that ETL. Here we see the source table containing ProductCategoryName, then we can see over here the transformations, all the way to the target table containing ProductCategoryName shown in the red table here.

Let’s go back to the Discovery module. Now, I’d like to show you an additional source. Let’s take a look reports that contain that column. Down below, we can see over here the different systems including SSRS. Let’s go to the Report Details. In the Report Details, we can see specifically over here the report name, which is product by category and sub category, will give you the path, the data set query.

We also give you the database name, schema name, table name, and we also can see the different columns. Here, specifically, there’s ProductCategoryName, the column that we need to change. You can actually see additional information by clicking on the plus sign on the left-hand-side, where we give you the URL, the connection string and other fields too.

If you’d like to see this visualization again of that report, just click on the link over here.

We come back to the inner system lineage of the report, where we have the column ProductCategoryName in the physical layer, semantic layer, and the presentation layer too. The last source which I’d like to show you are the SQL Server columns. We’re going now to the database, SQL Server. Let’s click now on the columns. Here we give you a list of all the different columns, and we give you the server name, database name, schema name, the object name highlighted in red, you can see here the word “Product”.

We give you the table name, column name, and the data type too. Down below, I can expand the amount of items that we see on the screen, and let’s sort it by column name. If we go down here below, I would like to show you where the ProductCategoryName column is found. Here we see it. It is found in these databases, that schema, and in those different tables and views.

We can see that ProductCategoryName is defined now as nvarchar(50), which we now need to change to nvarchar(100) as part of our project. Let’s summarize what we’ve seen in the metadata discovery module. By searching for the item, we found its location in the entire BI landscape, including the systems and their definitions. Now, we can plan how long this project will take, and the number of resources that we need to allocate, perhaps, from different teams, if you have different teams that have handle ETLs, databases, or reporting tools too.

This concludes the session in which we have seen how automated data lineage and metadata discovery can be used to improve the performance of the BI Teams for a variety of use cases. For a free trial of Octopai, you can get in touch with me through my email that you can see on the screen now, or schedule a session, a demo actually on our website.

I’d like to open the session now for questions from the audience. I have already quite a few questions. Feel free now to type in the Ask a Question area on your screen for any questions that you’d like to ask, and thank you for participating in the session. First, I’d like to go ahead and ask the following question. The first question I have is, how long does it take to start using the product? All right, actually getting up and running with Octopai is done very quickly.

It’s done actually in less than 24 hours. You just need to upload you metadata to Octopai, we analyze it, and then after a few hours, you can already start working with Octopai. Then, a lot of questions came in regarding training. Is there any special training that is needed to use Octopai? I can tell you that customers tell us that, as you can see on the screen, it’s quite intuitive.

There’s no special training, however, as part of our on-boarding process, your dedicated customer success manager does do training. In the beginning, actually, we perform several trainings if needed to additional groups or new hires in your organization and so forth. Then, I have another question, how is the software delivered? I should point out that Octopai is a SaaS solution, so everything is in the cloud, and no installation needed on your side. It is a cloud solution.

Great. Then, another question is, how do you extract the metadata from the different BI systems? We have extractors built for the different sources that you have seen actually previously in the presentation. We have the different extractors for the ETLs, databases. You just need to run the extractions. We receive it when you upload into Octopai. You can, by the way, automate as I mentioned the uploads to Octopai.

Then, we analyze it, and it is ready for your use for all those different use cases, whether those are lineages, metadata discovery, and the data catalog too. All right, I have another question regarding comparison between Octopai and Purview. Recently, we hear that there are more organizations out there that are looking at Azure Purview, which is a great tool of course.

Without going into too many details, Purview has it’s advantages, of course, focusing specifically on the Microsoft tools. Octopai is not limited, I should mention, to Microsoft. We do have the capability to cover other tools, Oracle, MicroStrategy and many other tools apart from Microsoft. Then, the functionality is quite similar. However, if you like to get into more details, please feel free to contact me.

You can send a email and we can go into more details regarding the comparison. Good, there is another question regarding, okay there are quite a few steps to get to the specific objects that need to be changed. I would like to mention that, in this session, I’ve shown a lot of different capabilities of Octopai. When you need to focus just on a specific object, for example, you might remember the case, where I’ve shown you in the use case number two that there was a change to a specific column from the ERP system.

I could have searched just by that column. Then, you will see only that column in an ETL, in a report, in the database. I clicked on many buttons to show the capabilities of Octopai, but it can be done by focusing just on where you want to look.

Great. Next question, how is the best and fastest way to uploading this information in ETLs, schemas, in order to make it automatic? Well, actually everything is automated, I’ll explain. You have from Octopai, we give you extractors for your specific tools, for your ETLs, your databases, your reporting tools, which are downloaded from Octopai. Your customer success manager will help you configure them for the first initial metadata extraction.

Then, you run the extraction, which takes about half an hour depending on the size of the system. They are uploaded to us. You can automate those uploads to run frequently on a regular basis, such as once a week, depends on how frequently you do your releases. Everything is fully automated, you don’t need to go ahead, and in Octopai write down the schemas or anything else, it is automated.

Great. Next question, does Octopai have business lineage? That actually is connected to our metadata, or I apologize, Automated Data Catalog regarding business lineages, so regarding lineages that contain the business descriptions, once you use Octopai’s data catalog, you can upload, you can enter the business descriptions, you can tag objects, tag assets actually, such as PII, GDPR, depends where you reside. That will be reflected in the lineages of Octopai. Okay.

Then, I have a question again regarding comparison between us and other systems. The major differentiators between Octopai and Manta. We get that quite often. I should mention that out there, there are a lot of very good tools that have very good capabilities very similar to Octopai. I should mention that many tools out there, if you compare also Manta to other tools too, like Purview and so forth. Many tools have very similar capabilities, lineages, and so forth.

I can just share one thing regarding the comparison between us and other tools out there, is the feedback that we have from our customers. We’ve received very high reviews regarding two areas. One is a whole contracts negotiation process, which we’ve been told that it is very easy to buy Octopai. Another area is our customer support, all the customer success management services that we’ve provided to our customers, we’ve received quite good feedback. If you want to get more information, I do suggest that you schedule a demo with us, and see Octopai in action on your own metadata.

Good. Another question I have here, is the catalog module an additional cost to the license or a lease? Okay, so I’ll just talk about licensing because people out there might ask about the licensing of Octopai, not just the data catalog. In general, the licensing of Octopai works in the following way, it is by modules of Octopai, such as the metadata discovery, the lineage, and the data catalog.

It is also by the BI system sources that you have, so depends on the amount of ETLs, databases, and analysis tools, and reporting tools. Each one of them is counted in the licensing of Octopai. Great. Thank you.

Thank you for all these great questions, and I truly appreciate the participation. Maybe I’ll just give you all another minute or so to answer another question if you have. Okay, I’ll just go back over the different questions. Okay, so I see no other questions are out there.

In that case, what I like to do, if there aren’t any other questions. First, I’d like to thank you all for participating in the session, we know that you’re all busy, and so we truly appreciate that you’ve taken the time out of your busy schedules to see this demonstration today of Octopai’s capabilities. Again, I’d like to invite you to contact us if you have any additional questions or schedule a demo on our website.

With that, I like to conclude today’s session, and thank you all for your participation. Wish you all to keep safe in these interesting times. Thank you. Take care.

Video Transcript

Asaf Melamed: Hello, everyone. Welcome. Welcome, everyone. Let’s give another minute or so for people to join in. In the meantime, you can sit back, relax, adjust the audio on your machine, and you’re going to enjoy the show very, very soon. We’ll just give another minute for people to join in. We will start the session together. Welcome. Welcome, everyone. Okay, so I’ll give people just a few more seconds to join in, and then I’ll start. In the meantime, you can sit back and relax, and enjoy the session.

Okay, so let’s begin. Let’s start. Hello, everyone. Welcome to the session in which we should see how automated data lineage can improve the performance of BI Teams. We’re very excited to have you all here. My name is Asaf Melamed. I’m the Head of Technical Account Management at Octopai. Just a few housekeeping rules before we begin. You shall all be muted throughout the session.

However, we will have a Q&A section at the end of the session. If you wish to ask any questions along the way, feel free to enter them in the area where you have a Ask a Question section on your screen. Let me just switch off my camera, and we’ll begin. All right, let’s start with a background about ourselves.

Octopai was started a number of years ago by our founders, who came from the BI domain, leading BI groups in different industries, where they faced many challenges, such as getting complaints from the business users about data reliability, accuracy, or corrupt reports with missing data, or issues related to finding where the data is, by understanding and discovering the metadata that resides in the various BI systems.

For this, they’ve created a solution that runs today in many organizations worldwide, from different industries such as manufacturing, healthcare, insurance, retail, gaming, and financial institutions too. How do we do it? All of that metadata that is so crucial to understand and difficult to collect is collected by us and placed into a cross-platform SaaS solution. We discover the metadata automatically with no manual processes, such as documentation, preparation, customizations, or professional services needed.

The metadata is then centralized, analyzed, modeled, passed, and then it is ready for discovery. You can easily find metadata in seconds with a mouse click. Octopai reduces the time it takes to do this from many days to seconds, providing the most accurate picture of your metadata. Not only is Octopai essential in the initial setup, collection, and cataloging, and analysis of your metadata, it is also essential moving forward.

Whenever you need to look for metadata next week, month, or year, you’ll be looking at the most current picture at that given point in time, not some spreadsheet that, with all good intentions, was created a few years ago and was never updated.

This is an example of a typical BI infrastructure that we commonly see amongst our customers. On the left-hand side, we can see examples of business applications being used by organizations, and consumed by thousands or even tens of thousands of business users, which do not have direct access to the data. The BI Team is responsible for making the data available to the business users.

This is why, at any given point in time, the BI Team needs to know where the data is, and understand its moving process throughout the different BI systems. This may be due to reporting errors or impact analysis issues. Because the metadata is scattered all over the BI landscape, our customers tell us that BI groups spend more than 50% of their time just trying to discover and understand where the metadata is, in order to understand the relationships, connections, and the data lineages too.

To overcome these challenges, we’ve leveraged technology to create a solution that extracts, centralizes, analyzes, and visualizes metadata automatically from your BI systems. We’re able to do this by simply extracting the metadata from the BI tools. This is then uploaded to our cloud for analysis, and then within 24 hours, you get an in-depth picture of your BI landscape. It’s as simple as that.

Octopai’s automated BI Intelligence Platform provides the following modules. The Automated Data Discovery enables you to instantly locate your metadata across the BI systems. The Automated Data Lineage XD provides you with a complete view of your entire data journey, and the Automated Data Catalog enables you to manage your data assets across the organization.

Octopai’s Data Lineage XD provides you with three different types of lineages, where you can find everything that you need in an easy-to-use platform. Cross System Lineage provides an end-to-end lineage at the system level, from the entry point into the BI landscape, all the way downstream, to the reporting and analytics. The Inner System Lineage details the column level lineage within an ETL process report or database object.

Understanding the logic and data flow for each column provides visibility at the column level, no matter how complex the process report or object is, and End-to-End Column Lineage details column-to-column-level lineage between systems, from the entry points into the BI landscape, all the way through to the reporting and analytics.

When coming to us, most of the customers have mentioned facing the following similar challenges, building new processes or changes in existing processes, mergers and acquisitions, impact analysis when changes are being made, how do you know what will be the impact of those changes before making them, or going to production, and also, reporting errors.

In today’s session, we should be demonstrating two very common use cases which BI Teams face during the ongoing maintenance activities, such as root cause analysis and impact analysis. Let’s start with our first use case, which is root cause analysis. Let’s imagine that you’re busy working on your month and the release to production, and to only have one day to have everything ready for the release.

Suddenly, you receive a new ticket. Your sales operation manager has opened a high priority ticket regarding in the field “TotalProductCost” in the report “Customer Sales by Year”, claiming that that field isn’t accurate. Now, you think to yourself, “Oh, great. A new ticket. Just what I needed now.” You realize that you need to investigate the report structure and might need to contact the DBA to understand the exact table names and views that relate to the creation of that report, and perhaps, something changed in the ETL processes associated with these tables of the report.

I can share with you that one that my customers told us before that they used Octopai, it used to take them hours and sometimes even days to do these root cause analysis investigations, which could involve also different teams. Don’t worry. Let’s see how this can be done with Octopai quickly. You’re looking now at the lineage module of Octopai. On the left-hand side, we can see the different levels of lineages, Cross System Lineage, Inner System Lineage, and End-to-End Column Lineage, which we’re looking at now.

You can also see the other modules of Octopai such as the metadata discovery, and also the Automated Data Catalog. On the left-hand side, underneath this blue bar, we can see the ETLs that came from our systems, SQL Server, SSIS, ADF, and Snowflake. These also include the store procedures, which can also load data into the database tables, which appear under the red bar here.

In the red bar, we can see here below different database objects, such as views, store procedures, and so forth. They came from SQL Server, Snowflake and Oracle in our system. By the way, you can also see the SSAS objects, too. On the right-hand side, you can see the reporting tools underneath the green bar. In our system, they came from Power BI, Tableau, and SSIS.

Let’s start investigating the end-to-end lineage of the column TotalProductCost.

[silence] Now, you can see over here, in fact, that column appears in two reports. Customer Sales by Year is the report that we’re looking for. Let’s click on the End-to-End. Now, I will show you the entire end-to-end lineage from the report all the way back, and we’re going to travel through different database objects, ETLs, all the way to the source system. First, let’s see what we have on the screen.

On the right-hand side in the green frame is the report. Customer Sales by Year.

It came from SSRS. In the red frame, we can see the database objects, and in the blue frame, we can see the ETLs. Within the reports, we can see the target column, this is the column TotalProductCost, which the end-user has complained about. We can see that the source column in green is also TotalProductCost and that came from the DwhFactSales table and the Dwh_Sales database.

We can see that it is getting populated by the Load DWH ETL that came from SSRS over here. Now, we can see that the source column of that ETL is also TotalProductCost. To see the lineage, you just need to follow the blue line backwards going upstream. That table is in the Staging Sales database. Going ahead, we can see that that table is getting populated by two ETLs.

The first top one is Load Staging, where we can see that the column TotalProductCost originates from three columns. UnitPrice, OrderQty, and UnitPriceDiscount that came from the Mrr_Sales database. Those are getting populated by the ETL Load MRR. The columns here originate from these three columns in the SALESORDERDETAIL table, which belongs to the AdventureWorks database, which is practically our ultimate source.

As you can see, by isolating our focus only on the columns that you can see over here in the lineage, we are not seeing the hundreds of other columns within these objects that are involved in the upstream data flow. Now, from looking at this ETL over here, I suspect that here’s the culprit of our issue. Let’s dive into these objects to see more detailed information. Let’s start from the beginning.

I would like to show you now the Inner System Lineage of the Customer Sales by Year report. How? Click over here on the Inner System Lineage link. This will open up the inner system lineage of that report, showing you the report’s column-to-column relationships flow diagram, from the physical layer in green is actually the database tables, through the semantic layer in orange, where you have all the transformations and calculations, to the presentation layer, this which you can see in red over here.

The column that we’re talking about his TotalProductCost and to see the lineage of that column, just follow the blue line again all the way back, and you can see that it came from the column TotalProductCost in this report from the DwhFactSales table in the Dwh_Sales database. I can go back and continue my root cause analysis investigation. I can go to the end-to-end column lineage, and look at the end-to-end column lineage on this level, or I can also go to the Cross System Lineage by clicking on the link over here, of Cross System Lineage, which provides us with a 30,000-feet view of the lineages between systems on an object level. Let’s click on that.

What you can see on the screen are the different objects involved in the lineage. Let’s go through the different circles over here. On the right-hand side, in green, is the report. Customer Sales by Year shown from SSRS. That report is associated with these four SQL Server tables shown in red, and they are populated by these ETLs shown in blue, which in our system came from Informatica, ATF, or SSIS. Now, I can continue my root cause analysis investigation on this level.

For example, if I’d like to see the objects that are the sources of this ETL, I can click on that, click on the back arrow, and now you will see the different SQL Server tables shown in red, which populates that ETL over here, including the Staging Fact Sales table coming from SQL Server. If I’d like to see what populated that table, Staging Fact Sales, click again on that circle. Click on the left arrow again. Now, you’ll see the three ETLs populating that table.

Those ETLs came from SSIS and Informatica here. We’re going to focus on the Load Staging ETL on the left-hand side. I would like to now drill into that Load Staging ETL, and look at the inner system lineage to see more granular information. Let’s dive into that. We’re now going to go to the Load Staging ETL. Let’s click on Drill. Here, again, you can see the entire column-to-column transformations within this ETL, where we can see the source tables in green, transformations in orange, and the target tables are shown in red over here.

The column that we’re focusing on is TotalProductCost. Let’s take a look at that. In this ETL, it is in the Staging Fact Sales table within the Staging Sales database. If I’d like to see where that came from, just follow the blue line here. It went into that transformation and originates from these three columns, OrderQty, UnitPrice and UnitPriceDiscount, which we’ve seen before. Those are in the MrrSalesOrderDetail table.

I’d like to see now the specific SQL scripts of the transformation, let’s click on that. On the right-hand side, you can see the entire script within that transformation including the definition of the column, TotalProductCost. You can see that it came from this formula, OrderQty multiplied by UnitPrice multiplied by 1 minus UnitPriceDiscount. Now, I know from a very reliable source that this formula was actually recently changed.

How do I know that? Actually, I made the change, but I forgot to document that. We’re going to talk about documentation in the next use case, but what I’d like to explain and recap on what we’ve seen so far is that we’ve done the root cause analysis on three levels. We turn it on the Cross System Lineage level with the objects. We started off with the end-to-end column lineage with the different columns, and ended up in the Inner System Lineage over here.

We’re looking again at the TotalProductCost and the transformation here. I can now show you that the full end-to-end root cause analysis investigation can be performed by one person in Octopai without the need to involve additional teams to obtain this information. I can go ahead and change that formula, close the ticket, and go back to working on my month end release.

Now, let’s move ahead to the next use case, which is Impact Analysis. One day, your ERP logistic team lead comes up to you and informs you that they have already increased the size of the field “ProductCategoryName” from 50 to 100 characters and you think to yourself, “Couldn’t they have informed ahead of time?” Now, you’re faced with the fact that you need to find its location in entire BI landscape, understanding how long your project will take, and how many resources you need to allocate to it, and that might involve different teams too.

Let’s see how your life will look like with Octopai, understanding exactly where that field is found in your BI systems, so you can plan accurately for performing this change and keep your business users happy. Let’s go to the metadata discovery module of Octopai. The metadata discovery module is actually a textual documentation of your BI landscape. You can search for any object, text, or even formula.

You can actually schedule to automatically run the upload of your metadata extractions to Octopai on a regular basis. Each time that your metadata is uploaded to Octopai, we refresh your documentation of your BI landscape, so that it is always kept up-to-date. Now, we need to scope a project, as we need to make changes to a certain field ProductCategoryName. I’m going to type in the word “Product”. As you can see Octopai gives us the top suggestions, where it found that word, in the ETLs, databases, analysis tools or reports. When I click on Enter, Octopai starts to search for that word within the different systems that we have here in Octopai.

You can see in green that we have searched the different entities, and we are showing you the number of times where we have found the word “Product”. For example, product can be found 13 times in the table sources of Informatica, or over here below, 97 times in the columns of Snowflake. The gray boxes represent the objects that we’ve searched, but we have not found the word “Product” in them.

We recommend that you don’t waste your time looking for the word “Product” in the constraints here, focus on the green objects, where it is found. You can actually dive deeper into anyone of these boxes to see additional information. Let’s start with the SSAS packages, these are the ETLs. Let’s take a look at them. What we see over here is this. In this table format, we give you the ETL name such as Load DWH.

We give you the database name, schema, the table name, and also the column name which is found over here. Specifically, there is ProductCategoryName, which we need to change. I can go ahead and filter all the information here. I can also filter the columns and change the different filters like Contains where it starts with ends with. I can export everything to Excel, so that you can actually collaborate between your group members. You can export this to Excel, and share the information within your organization.

Actually, everything is fully exportable from Octopai including the lineages, the visualizations that you’ve seen previously. To see the visualization of the ETL load DWH, I can click on the SSIS Map here on the right-hand-side. This will bring us to the inner system lineage of that ETL. Here we see the source table containing ProductCategoryName, then we can see over here the transformations, all the way to the target table containing ProductCategoryName shown in the red table here.

Let’s go back to the Discovery module. Now, I’d like to show you an additional source. Let’s take a look reports that contain that column. Down below, we can see over here the different systems including SSRS. Let’s go to the Report Details. In the Report Details, we can see specifically over here the report name, which is product by category and sub category, will give you the path, the data set query.

We also give you the database name, schema name, table name, and we also can see the different columns. Here, specifically, there’s ProductCategoryName, the column that we need to change. You can actually see additional information by clicking on the plus sign on the left-hand-side, where we give you the URL, the connection string and other fields too.

If you’d like to see this visualization again of that report, just click on the link over here.

We come back to the inner system lineage of the report, where we have the column ProductCategoryName in the physical layer, semantic layer, and the presentation layer too. The last source which I’d like to show you are the SQL Server columns. We’re going now to the database, SQL Server. Let’s click now on the columns. Here we give you a list of all the different columns, and we give you the server name, database name, schema name, the object name highlighted in red, you can see here the word “Product”.

We give you the table name, column name, and the data type too. Down below, I can expand the amount of items that we see on the screen, and let’s sort it by column name. If we go down here below, I would like to show you where the ProductCategoryName column is found. Here we see it. It is found in these databases, that schema, and in those different tables and views.

We can see that ProductCategoryName is defined now as nvarchar(50), which we now need to change to nvarchar(100) as part of our project. Let’s summarize what we’ve seen in the metadata discovery module. By searching for the item, we found its location in the entire BI landscape, including the systems and their definitions. Now, we can plan how long this project will take, and the number of resources that we need to allocate, perhaps, from different teams, if you have different teams that have handle ETLs, databases, or reporting tools too.

This concludes the session in which we have seen how automated data lineage and metadata discovery can be used to improve the performance of the BI Teams for a variety of use cases. For a free trial of Octopai, you can get in touch with me through my email that you can see on the screen now, or schedule a session, a demo actually on our website.

I’d like to open the session now for questions from the audience. I have already quite a few questions. Feel free now to type in the Ask a Question area on your screen for any questions that you’d like to ask, and thank you for participating in the session. First, I’d like to go ahead and ask the following question. The first question I have is, how long does it take to start using the product? All right, actually getting up and running with Octopai is done very quickly.

It’s done actually in less than 24 hours. You just need to upload you metadata to Octopai, we analyze it, and then after a few hours, you can already start working with Octopai. Then, a lot of questions came in regarding training. Is there any special training that is needed to use Octopai? I can tell you that customers tell us that, as you can see on the screen, it’s quite intuitive.

There’s no special training, however, as part of our on-boarding process, your dedicated customer success manager does do training. In the beginning, actually, we perform several trainings if needed to additional groups or new hires in your organization and so forth. Then, I have another question, how is the software delivered? I should point out that Octopai is a SaaS solution, so everything is in the cloud, and no installation needed on your side. It is a cloud solution.

Great. Then, another question is, how do you extract the metadata from the different BI systems? We have extractors built for the different sources that you have seen actually previously in the presentation. We have the different extractors for the ETLs, databases. You just need to run the extractions. We receive it when you upload into Octopai. You can, by the way, automate as I mentioned the uploads to Octopai.

Then, we analyze it, and it is ready for your use for all those different use cases, whether those are lineages, metadata discovery, and the data catalog too. All right, I have another question regarding comparison between Octopai and Purview. Recently, we hear that there are more organizations out there that are looking at Azure Purview, which is a great tool of course.

Without going into too many details, Purview has it’s advantages, of course, focusing specifically on the Microsoft tools. Octopai is not limited, I should mention, to Microsoft. We do have the capability to cover other tools, Oracle, MicroStrategy and many other tools apart from Microsoft. Then, the functionality is quite similar. However, if you like to get into more details, please feel free to contact me.

You can send a email and we can go into more details regarding the comparison. Good, there is another question regarding, okay there are quite a few steps to get to the specific objects that need to be changed. I would like to mention that, in this session, I’ve shown a lot of different capabilities of Octopai. When you need to focus just on a specific object, for example, you might remember the case, where I’ve shown you in the use case number two that there was a change to a specific column from the ERP system.

I could have searched just by that column. Then, you will see only that column in an ETL, in a report, in the database. I clicked on many buttons to show the capabilities of Octopai, but it can be done by focusing just on where you want to look.

Great. Next question, how is the best and fastest way to uploading this information in ETLs, schemas, in order to make it automatic? Well, actually everything is automated, I’ll explain. You have from Octopai, we give you extractors for your specific tools, for your ETLs, your databases, your reporting tools, which are downloaded from Octopai. Your customer success manager will help you configure them for the first initial metadata extraction.

Then, you run the extraction, which takes about half an hour depending on the size of the system. They are uploaded to us. You can automate those uploads to run frequently on a regular basis, such as once a week, depends on how frequently you do your releases. Everything is fully automated, you don’t need to go ahead, and in Octopai write down the schemas or anything else, it is automated.

Great. Next question, does Octopai have business lineage? That actually is connected to our metadata, or I apologize, Automated Data Catalog regarding business lineages, so regarding lineages that contain the business descriptions, once you use Octopai’s data catalog, you can upload, you can enter the business descriptions, you can tag objects, tag assets actually, such as PII, GDPR, depends where you reside. That will be reflected in the lineages of Octopai. Okay.

Then, I have a question again regarding comparison between us and other systems. The major differentiators between Octopai and Manta. We get that quite often. I should mention that out there, there are a lot of very good tools that have very good capabilities very similar to Octopai. I should mention that many tools out there, if you compare also Manta to other tools too, like Purview and so forth. Many tools have very similar capabilities, lineages, and so forth.

I can just share one thing regarding the comparison between us and other tools out there, is the feedback that we have from our customers. We’ve received very high reviews regarding two areas. One is a whole contracts negotiation process, which we’ve been told that it is very easy to buy Octopai. Another area is our customer support, all the customer success management services that we’ve provided to our customers, we’ve received quite good feedback. If you want to get more information, I do suggest that you schedule a demo with us, and see Octopai in action on your own metadata.

Good. Another question I have here, is the catalog module an additional cost to the license or a lease? Okay, so I’ll just talk about licensing because people out there might ask about the licensing of Octopai, not just the data catalog. In general, the licensing of Octopai works in the following way, it is by modules of Octopai, such as the metadata discovery, the lineage, and the data catalog.

It is also by the BI system sources that you have, so depends on the amount of ETLs, databases, and analysis tools, and reporting tools. Each one of them is counted in the licensing of Octopai. Great. Thank you.

Thank you for all these great questions, and I truly appreciate the participation. Maybe I’ll just give you all another minute or so to answer another question if you have. Okay, I’ll just go back over the different questions. Okay, so I see no other questions are out there.

In that case, what I like to do, if there aren’t any other questions. First, I’d like to thank you all for participating in the session, we know that you’re all busy, and so we truly appreciate that you’ve taken the time out of your busy schedules to see this demonstration today of Octopai’s capabilities. Again, I’d like to invite you to contact us if you have any additional questions or schedule a demo on our website.

With that, I like to conclude today’s session, and thank you all for your participation. Wish you all to keep safe in these interesting times. Thank you. Take care.

Announcement ! We are happy to share that Octopai has been acquired by Cloudera