How to Know if You Need a Data Dictionary, Business Glossary or Catalog

Play Video

There’s SO much confusion out there about the difference between a Data Dictionary, a Business Glossary, and a Data Catalog. Watch Malcolm Chisholm, Ph.D., President of Data Millennium, and Amnon Drori, CEO & Co-Founder of Octopai, as they help understand what is right for your organization’s needs and challenges.  

There’s SO much confusion out there about the difference between a Data Dictionary, a Business Glossary, and a Data Catalog. Well, we’re here to help clear it all up. Watch Malcolm Chisholm, Ph.D., President of Data Millennium, and Amnon Drori, CEO & Co-Founder of Octopai, as they help understand what is right for your organization’s needs and challenges.  

Video Transcript

Amnon Drori: Great. I guess we can start. Malcolm, can you hear me?

Malcolm Chisholm: I can, Amnon. Can you hear me?

Amnon: Yes. We’re ready to go.

Malcolm: We’re good.

Amnon: Great. Thank you, everyone, for choosing to be with us today, and thank you for joining. My name is Amnon Drori, I’m co-founder of Octopai. Joining me today is Malcolm Chisholm, President of Data Millennium. The reason we wanted to have this webinar today is, during our journey in the past couple of years talking to hundreds of BI professionals, it looks that there’s one topic that we hear again and again, and again that is still unclear and creates some confusion regarding the topic we’re going to talk today, about data dictionary, business glossary, data catalog, BI catalog, and all these terms.

Given our experience and talking to, really, hundreds of BI professionals, we thought that it would be valuable to spend the time today with Malcolm so he can share his view, from his experience, trying to shed some light about what he thinks are the differences between each and every product, and what exactly is the time that you really know that you need all of them, one of them, or some of them? Malcolm, we can start.

Malcolm: Thanks, Amnon.

Amnon: Sure. Can you see my screen?

Malcolm: Yes.

Amnon: I just want to mention that we have a very dedicated whitepaper talking about this entire topic. I invite you, everyone, to go to octopai.com and download this whitepaper. What we would like to cover in the next 30, 35 minutes, is highlights of this whitepaper that Malcolm wrote for us. Let’s start with everyone’s participating in some kind of exercise to start the morning, to start the session. The topic of today is trying to understand the differences between data dictionary, business glossary, and data catalog, and when you need them.

It would be really cool if you can participate in a poll, and Melissa’s going to pop it right now, is, could you share your experience about do you really understand the differences between all of them? Or kind of understand the differences between all these three tools or you’re lost? Or from your perspective of what you know, they’re all the same? Take a couple of seconds to answer this. We have another two polls down the road of the session, and we’re going to share all the results at the end of the session so you can see what your peers have answered.

Melissa, we can move forward. Malcom, why now?

Malcolm: Why are people asking this big question now about these different kinds of tools? Because now is different. If we go to the next slide, we can illustrate it a little bit. We’re living, essentially, in the golden age of data, but it wasn’t like this. Decades ago, computerization came in and the idea was to automate what, in those days, were manual processes. Today, we’re looking at process optimization, but the emphasis for several decades was really on automation. Data was considered like exhaust that came out the back end of applications. It was a by-product. People weren’t really interested in it all that much.

There was a need for BI activities to support automation, operation reporting, things like that, and, again, because there was no emphasis on data, the volumes were relatively small and it was chiefly internal to the enterprise. Well, if we fast forward to today, beginning in the 1990s but picking up in the 2000s, it’s all changed. We’re in the golden age of data. What has happened is that data’s increasingly important, it’s at the heart of business models.

You can think of very successful companies like Facebook, Amazon, et cetera, all about the data. Then BI has grown and evolved. I think I would credit Amnon with his insight into the social evolution of BI, that it is now a domain. It’s not just a team that supports the final step of data visualization. First off, it’s diffused throughout the enterprise because everybody needs insights.

Also, its tentacles, as it were, reach out into that ecosystem of where data is being produced, transported, and transformed, and as it winds its way to that final step, which is still important, data visualization, the volumes have increased. That they’re massive today compared to what they were just a while back, and they’re growing. The complexity is growing because the business is demanding more and more from BI. It’s a virtuous circle, from a BI professional standpoint, of we need more and more BI to unlock more and more value from the data, and the data’s more complex, and there’s more of it.

With this gigantic swirl, you’re going to need tooling to help you. You’re going to need to answer these questions about, “I just can’t sit here and do this all in some ad hoc, manual fashion like I could decades ago. I do need some kind of tooling support.” The questions are coming to the fore. “Okay, what is it? Is it a business glossary? Is it a data dictionary? Is it catalog? What is it? I need it, and I need it now. I understand I didn’t need it before, but today is different.” That’s, I think, how we’ve got to where we are today.

Amnon: Just from our perspective, and correct me if I’m wrong, when we refer to BI, we’re not only talking about what is known as the “BI tool,” which is the reporting tool. When we refer to BI, we basically refer to the entire landscape that, under the management of great BI professionals, includes ETLs, SQL scripts, data warehouse, databases, analysis services, and reporting tools from different vendors. The collection of these tools together is what we call the “BI landscape” or the “BI professional,” right?

Malcolm: Right, absolutely. That was what I was referring to, about the tentacles reaching out way back from that final step of the visualization layer into, frankly, what’s been an organic growth adding to this complexity of this architecture. It’s all of that today, it’s absolutely not just this final– the last mile, as it were.

Amnon: You talk, also, in the paper, about the BI landscape and the BI intelligence that could be derived from understanding the entire landscape. What exactly do you mean about “BI intelligence”?

Malcolm: What’s intelligence? It’s actionable knowledge that you have available right now to do something with. As a BI professional, you don’t want to receive a requirement or a maintenance request, or a user reporting a problem, and then that’s the starting point for some journey into the unknown of exploration to find all this stuff out because, based on what we’ve just said, that’s not going to scale today. There’s too much data, too much complexity, too much demand from the business. You’ve got to have at hand, today, the knowledge that you can use to perform your work as a BI professional.

You just don’t have this luxury of being able to do this manually anymore. It has to actually be automated, and while I’m sure we’ll come to that, but it’s also, you can’t rely on friends and family, tribal knowledge. You’ve got to have this knowledge about the data, about everything you were saying, Amnon. Where is it out there? What platforms is it in? How can I get it? How is it coming to me? How is it being transformed? What am I looking at? What does it mean? What are the known problems with it? All of this has to be available to you. If you don’t have that, you’re just not going to be able to keep up with what the business is now demanding.

Amnon: In order to be more granular in understanding the differences, can you lead us to understand better, from your perspective, what are the differences between the business glossary, dictionary, and catalog?

Malcolm: I think if we go to the next slide, we’ve gotten it called out. Now, I’ll talk a little bit about the stakeholder communities, too. If you’ve got a business glossary, what’s a business glossary? It is a way in which you can trap business meaning and facts of business significance about business concepts.

They don’t actually always have to be data elements, which will end up as columns in databases, or entities that will end up as tables and databases. They can be about just abstract business concepts and we can describe them. All of that is in the business glossary. I think that part of the trick with the business glossary is to get the content really well-managed, to make it useful, actionable. We can’t think of it in just the terms of a dictionary we’ve seen in school with a tiny little definition, “Oh, and we’re done,” and that’s it. It’s much more than that today. For a BI professional, it has to be.

We’re going to use it, for instance, to understand what exactly the user is talking about when they give us some kind of requirement because the requirements might be quite vague. We want to check them against something where we have the true meaning of what these terms are, and the terminology, too, is in the business glossary, and that often gets lost.

What are the right terms to use?

They’re going to end up as labels in dashboards, reports, and screens. What one should I be using to convey the right meaning to the users? And particularly in areas like metrics where you want to have, not just the definitions, but the methodologies about how this data is collected, that’s going to be important, too, for the interpretation leading to the insights of what we supply as BI professionals.

For instance, one metric I came across a while back was traffic going into sales centers. Well, how’s that measured? Is it everybody who walks in? Is it just people who’ve left their names for follow-ups? Was it families? Was it pets? What was it? All that goes into a business glossary. The BI catalog has a large element of looking at that data visualization, that reporting layer, and capturing what’s there in terms of, what reports do we have? What’s on them? What’s included in them? What transformations happen in them?

There’s going to be more to that, so we’ll come back to that in a minute, but that’s also important for, again, as BI professionals, we want to know if I’m asked to develop something, does it exist already? And, quite frankly, another part of BI is the administrative aspect of it. We don’t want to keep perpetuating reports that no longer have a use. That part of it is also useful in a BI catalog but, again, I’m going to come back to that again in a moment.

The data dictionary is the physical data store assets that we have out there, which would be typically schemas, tables, columns, their technical characteristics, like their data types if they’re null, and so on and so forth. That is our entire data landscape in its physical, technical implementation sense, and that can be an awful lot of information. Data discovery is capabilities on top of that that allow us to find data, to find things within that dictionary, and the dictionary might, today, also have additional profiling information about data in it as well. Not just the static, technical aspects, but more information about the kinds of values, ranges, numbers of discrete values, things like that as well.

Now we’ve got data discovery, very useful to a BI professional who’s looking for data, and we want to help BI professionals to get to the right source of data, the best source of data with the most confidence, in the shortest time possible, so that supplies that. Then, finally, we have the big, very difficult nut to crack, which is data lineage. How is the data flowing across all of this typically very large landscape? And how’s it going across platforms through, again, what you referred to earlier, Amnon, like SQL scripts, store procedures, ETL tools, and so on?

What goes on in them? How is the data changed through transformation in that? Then I think the dictionary, the data discovery, and the data lineage part also come back into the BI catalog. We want to marry all that together with that information about the data visualization layer. Why? In part, because that cements or joins the technical perspective to the business perspective because in reporting, that’s where data itself becomes information, it becomes business understandable. It’s the reports are so important for bridging that chasm between business understanding and technical implementation.

I think that’s a very rough viewpoint into what they are. In terms of the stakeholders, BI developers are going to be interested in all of these. Business users are more in the business understandable areas of the BI catalog and the glossary. DBAs are more in the technical areas, and these tools are very useful to them, too.

Business analysts– and they include data analysts today as well, span a little a bit of all of this in terms of doing things like source data analysis, and data governance analysts who are more about managing the resources and making sure everybody is fulfilling their roles, they’re also interested in another subset of them. I think that’s my take on what these things are.

Amnon: If there’s one takeaway that I can derive from your answer is that, in order to provide, really, a corporate solution or try to orchestrate all the different entities that are responsible for the data movement process from the data sources to the data consumers and to govern them, you probably want to have, as an organization, all of them, but synced together so you can have a really coherent understanding of the data journey, not only in the form of lineage but to understand the definitions.

That when a business analyst sees a field called “customer,” it’s you can track this all the way to its real meaning, all the way back to the data sources. That’s something that we’ve seen, also, repeatedly happening within organizations, trying to understand, to figure out which tool to be used for which user, and now if you combine them all, you can provide a corporate solution.

Malcolm: Right, and I think that’s super important. The scary thing, to me, is if you have these stakeholder groups acting independently and they get just a collection of independent tools, and they don’t sync, as you were saying, Amnon, they don’t sync, you’re going to not get the value that you need out of them. You’re, actually, probably going to have all kinds of problems where you try to have, who knows, manual interfaces or something like that between them. It’s not going to be happy.

Amnon: Focusing a little bit about business glossaries, what are the reasons that you see are pros and cons to actually consider that?

Malcolm: The pros are that the business can interact with the business world. This is their world, that it’s got the meaning that they want. I think that it’s important to have content in there, and this slide illustrates one of the cons. Let me go through that then come back to the pros. You can take a business glossary as a tool, you can go implement it and throw it out there, and say, “Hey, everybody, we’ve got a business glossary, go fill it up with content.” People are very busy, and that’s a big problem, and so they’re not going to find it useful right away.

Why? Because there’s nothing in it, and then they’re going to be asked to put stuff into it. It’s like, “Okay, and how long’s that going to take?” Quite a long time. You’ve got to have some way to have a business glossary with some degree of provisioning of it, maybe it’s automated so that it gets up to a critical mass where it does become useful to the business stakeholders, and after that, is going to be in a steady state because business terms will change, but the rate of change isn’t that high, and it’s going to be also about adding more and more information to it.

This is the big thing you need to think about when you’re having your business glossary, it’s all about the content. Is it sufficiently there in terms of quantity? Is it good in terms of quality? Then the quality aspect of it is, again, don’t just think in terms of definitions, put in all facts of business significance about the concepts that you’re dealing with, most of which are going to be data elements.

Things not just like the definition, whether it’s personal information, whether it’s payment card, industry information, but also these methodological things about how we’re collecting this particular data item.

If there’s rules like, oh, we have “customer first name” and “customer last name,” but when a person only has one name, we put a slash in “customer first name” and put that one name in “customer last name.” All these usage facts, they’re super important for the BI professional.

Amnon: Let’s move forward with that. You talked, in the previous slide, about the different tools for the different stakeholders, and you mentioned, specifically, about the importance of connecting the understanding about the data, like data dictionary and data catalog, to the data lineage. Can you share a little bit more than that?

Malcolm: Business glossary is great at semantics, we understand what things are, but that doesn’t tell us everything we need to know, unfortunately, by far. One of the other aspects we have to think about is what I call and is called “data universes,” which is the populations of data that we’re using, that we’re working with. Here, we have an example of a pretty simple flow. We have, at the far end, a global listing of all employees in the human resources department.

That’s great. That’s coming out of a global combined employee database, but let’s say we have somebody new in HR and they’re given this, and they’ve got to work with it, and they don’t really know what it’s about. Well, they can go to the data dictionary and look at what an employee is and maybe there’ll be some documentation of the different kinds of contracts that exist and so on and so forth, but they don’t know, still, what’s in it.

If you have data lineage, as we see here, you can trace backwards. You can find that, “Oh, this data is coming from a Canadian employee database and is also coming from a US employee database.” I have a global universe of data made up of a universe of Canadian employees and US employees. Now I know what data I’m working with. Yes, I knew the definition, but without the lineage, I’m not going to know the populations of data in the ultimate dataset that are being pulled into my data visualization layer. I’ve got to have the data lineage to really understand that, there’s no other way.

Amnon: What you’re saying is that where, so far, different tools were used by different people that were silos within the organization, like the data analysts were looking at the reports, and there’s the ETL developers, and the DBAs– What you’re saying is that the word is coming to a more coherent understanding, you need to be able to sync between the tools in order to provide, at the end of the day, one story about what this data means and how you track its source through the different transformations?

Malcolm: Right, this is BI intelligence. You’ve got to know everything that you need to know about the data. The business glossary will tell you meanings, but this part of data lineage serves this purpose to tell you what data you have in terms of the populations, the universes of data. Then each individual has his own concerns, like the data analyst, the DBA, the BI developer, et cetera. You’ve got to have this unified view across everything.

That’s why the tools we’ve been discussing really have to come together, again, be synchronized, like you said. Otherwise, it’s just not enough. One of them isn’t going to do it by itself.

Amnon: Now let’s move to the relationship between the BI catalog, and how does it relate to business glossary?

Malcolm: I think this is, again, where automation is going to come in, because we’re looking, as we just did, at technical flows across the data landscape from the US and Canada to a global report. Okay, that’s fine, but we’re ultimately going to be starting off with a technical layer, or a set of technical layers, like a database where this data resides. I know it’s much more complex, again, as you’ve highlighted, but it’s got to get into a report.

We’ve got data, we’ve got a report, ultimately, here, and we’ve got, “Okay, how do we make sense of that?” Well, the business glossary, through the BI catalog, can take the technical information and put it into a view which is business-understandable. We can see, “Okay, we’ve got “customer first name,” it’s got a synonym of “first name.” It’s in a report, “such and such customer profile” or “customer daily tracking.” It’s coming from this database column, it’s coming from this table.”

You can see that the business glossary, in conjunction with the automation provided by the BI catalog, is fusing together the technical information with the business understanding so that everyone could understand what’s going on. You’ve achieved BI intelligence, not just in some technical way, but even toward the extent that it’s user-understandable.

Amnon: If I’m an organization that I, now, do understand a little bit about, why would I need the data catalog, the BI catalog, the business glossary, the data dictionary, and the lineage? And I want to have it, so I guess, at least from what we’ve seen, there’s a lot of work that needs to be put in in order to achieve that.

Malcolm: I got lots of scars from this personally, and I’ve been involved in more projects than I want to remember where we tried top-down projects with manual work and combing through this, and asking the users– These can be successful in a very narrow scope and for solving an immediate problem, but not for BI intelligence. You can’t do that, you need automation. That, again, the scale– yes, and you can see it here.

These are the kinds of results that you’re going to get at the level that matters for BI intelligence. This is showing you a flow through various platforms and ultimately ending up in a report or a dashboard. It is that the complexity has been built up over decades. It just is what it is, and you’re not going to be able to find it without automation. You can try, but a couple of things happen. First off, humans make mistakes in documentation. Even if you’re doing it proactively as a developer, you can still make mistakes.

Also, the scale is just so vast that when you’ve done a little bit of it, you move on to the next bit, chances are the first bit that you’ve touched is going to change and it’s going to be out of date. You’ll always be chasing your tail, you’ll never have anything that you trusted. Then, on top of that, you’ve got complexity. You’ve got problems with going across platforms. It might be Oracle in one place, SQL Server in another, DataStage, SSIS, Informatica ETL, and SQL scripts, stored procedures that are hand-coded.

Do you want to have an expert in each of these come in and play in a manual project? They don’t have enough time. They’re doing better things than trying to do this kind of documentation. That’s before we even get to the reporting layer where there’s all kinds, again, of a whole set of different tools that are used, different technologies and transformations. Other complex events may happen inside of the reporting layer itself.

It’s if you don’t automate this, you’re not going to get anything like the level and reliability of the information you need for what we started out with, which is BI intelligence. Actionable knowledge that you can have confidence in to do your work as a BI professional.

Amnon: I can share that in the past two years, a lot of companies have done a really good job educating the market about the importance of data catalog and data dictionary, business glossary, and trying to explain what each one of them is doing and the value and benefits of using that.

I think that the next step that we see as a barrier for these organizations that do choose to adopt one or some of these tools is how to get that up and running, and the manual work that needs to be in place, and the cost of IT that needs to be put in just to enjoy these kinds of capabilities sometimes becomes the bottleneck and the barrier of adopting these kinds of tools. While you convince organizations about the importance of using these kinds of capabilities, you need to be able to ease the cost and the effort for them to enjoy, finally, those kinds of capabilities that you worked so hard to convince to need.

I think that by triggering organizations to want these capabilities, but not enabling them the easiness of getting that is a sure blocker. We’ve seen automation as a natural thing that organizations talk, every day, “I like your functionality, but what would it take for me to start using it or enjoying it?” People want to get rid of these kinds of manual works, they want to have a magical button saying, “I’m closing my eyes. I’m pressing a button. Boom, it’s there.” I couldn’t agree more, I think that we see that almost every day. Now let me put you in a corner. You’re talking about the importance of each and every one of these tools, and you explained, very nicely, why we need all of them, but if you were an organization that needs to choose one, what is your recommendation? Which one should I choose?

Malcolm: If I had to– if you’ve put me on the spot, which you’re doing– if I could only get one capability, it would be the catalog because I think that’s going to give me that view into my BI assets. Frankly, I think, also the modern catalogs do have lineage capabilities, at least to some extent. If I had to give up everything else and just go for one, it would be that, but I would like, obviously, a tool that does them all and does them in an automated fashion. Based on what you were just saying, I’d also like it to be easy to implement, like it was in the cloud or something, rather than having to mess around on-prem too much.

That is a tough question to answer, and I do get it, that it’s a serious question for many organizations, but I think the one that’s going to give you the biggest bang for your buck, even though it’s not the holistic solution, is going to be the catalog.

Amnon: You mentioned, earlier, about each one of these tools and the collaboration between those tools, and you shared your view. I would like to address the audience for a second. I want to bring up the second poll out of the three that we’re going to use today. From your experience and what you’ve heard today, and Melissa can pop up the poll, how much do you believe the data dictionary, business glossary, catalogs, and data lineage should all collaborate together? Do you think that they absolutely must collaborate in order to provide a really coherent, 360 view of the data journey and data? Meaning, you think it is important, but it’s not a must?

Do you think it would be nice to have? Or you don’t see any reason for them to collaborate and we’re in a silo-based mode? Take a few seconds to answer this and we can move forward. I don’t know how much everybody knows Octopai. We’ve been around for the past four years. After working with BI as leaders in BI organizations in telecom, insurance, banking, and healthcare, we very much believe that BI is not just a team or a group, it’s a domain that had grew its importance within the entire organization, and basically connects the data sources to the data consumers by running this through the BI.

There are so many different challenges there. When you combine the use cases that we as BI professionals run through every day– from day-to-day operation to understand if the report is showing accurate data, all the way to running a project of moving an on-prem reporting tool to a cloud-based reporting tool that can take two to three years’ project, and anything in-between, we believe that you should be able to really get intelligence about what’s going on in your BI. The derivative of this theme creates different tools that enable you to really and properly manage your BI, anything from the operational side to things you don’t know that you don’t know about your BI.

To illustrate that, what we present in our theme is the ability to consolidate different BI tools into a central place, and understand the information about the data from these kinds of tools. You can see a whole bunch of them representing a collection of tools that are responsible for shipping the data from the data sources to the data consumers, but also, very notably, you can see different tools of different vendors. Now, it’s become a salad of different tools that are growing by their number and it just becomes more and more difficult to do your job. We leveraged the cloud, and it’s a power and elasticity to really create a crowd-wisdom or “cloud wisdom.”

We have analyzed, so far, thousands of different BI systems in order to provide the analysis for you. By the way, it takes about 30 minutes or 45 minutes of your time just to extract, upload, and then enable us to analyze that. I think that the power of analyzing cross-platform metadata from so many different tools provides you to enjoy the analyzed metadata in different shapes of tools– some of them, we talked, today, and some of them are not within this solution at Octopai, and collaborate, for example, with external or other vendors’ data catalog or data governance, and so on and so forth.

You can really deal with this list of use cases that we’ve seen very much represent the users that end up spending their time looking, investigating their BI landscape almost on a daily basis. I think that what we’ve seen as a very important vehicle to provide all of that is to think differently or to challenge the status quo by analyzing the entire cross-platform, and you don’t need to do it yourself. Very easy to start. Malcolm talked about it, about everybody is intrigued to use new tools, but then the cost of getting them up and running is very painful, to a certain extent, and you want to make their life easy.

There’s no point of triggering organizations to use functionality but they need to work so hard just to get it. You want to make it easy and simple. I’m going to jump into the last question or to the last poll that I would like to share with you.

From what you’ve heard today of the complexity and what you go through on an everyday basis, looking forward to what your organization would need from the BI, do you think that automation is the key to drive your organization from BI operation to BI intelligence, to the next level? Do you think that you can still continue to do things manually or somewhat more manually, or you think that automation is the game-changer to cross the chasm and take you to the next step?

Do you think it’s a must? Do you think, to a certain extent, it is important? It is an option, but it’s not a must, or you really disagree that automation has no room in our life in this level of complexity? Take another few seconds to answer this.

[silence]

With that third poll, I would like to move to the results. Just a quick reminder that, again, don’t forget to download the whitepaper that talks more in detail about what we tried to share with you today, and I’m going to ask Melissa to share the results if possible? Great. Just a quick reminder. The first question, can you see the results? Malcolm, can you see it?

Malcolm: Yes, absolutely. Yes, they’re up.

Amnon: How strongly do you feel that you really understand the differences between all of the tools that we talked today? Well, you can see that more than 70% are towards “I somewhat understand” or “I’m really lost”. It’s a good practice or the topic of today seems to be on point. Malcolm, do you want to add something about this?

Malcolm: Yes, that’s not what I expected, to be honest with you, Amnon. There’s been a lot of marketing buzz out there about all of these tools, and I think that it’s not serving the market well, so we do need to find ways to clarify what these differences are. Maybe this is also the thought leaders and the consulting community as well needs to, I think, do a better job of highlighting just what the differences are and what their significance is. That is a bit of a shocker to me, actually.

Amnon: On my side, I suspected that the topic that we try to help educate the market today represents what we see. That a lot of organizations still don’t really understand the differences between the tools, and, from one hand, since they don’t understand, they’re hesitant of taking decisions which tool to buy, but it’s also our responsibility as vendors to try to clarify this and try to help some kind of an understanding about which tool does what so we can help our customers to make the right decision. Melissa, can we go to the next poll?

The next one, the second one out of the three was, “How much do you believe the data dictionary, business glossary, and the rest of the tools should collaborate?” Well, they must collaborate 50%, and it is important for them to collaborate. Almost 92% said that these tools should collaborate, and probably, it represents the concept of, “No more silos within the organization. We need to have a coherent understanding about the data movement process for all the different stakeholders throughout the organization.”

Malcolm: Yes, I think people are scared of the silos. These have happened in the past as well, metadata repositories that have been siloed, like individual application data dictionaries and things like that, and it’s not good. It’s great to see, very encouraging to see that everybody wants them to collaborate. I think that is more than just the silo elimination.

These tools are qualitatively different like we’ve said, and people want the synergy to be there. That the whole of what they give you, BI intelligence, is going to be greater than the sum of their parts. I think that’s reflective of what we’re seeing here because it’s just so high. 92% is high.

Amnon: Yes, it’s amazing. I think that what’s encouraging in that, that people understood that moving away from silo to cross-platform or collaboration is the way to make sure that the data is moving properly, but also can be trusted all the way from its technical term, physical term, and business term, at the end of the day.

Moving on to the last one, do you think automation is important or is a key to take your BI operation to the next level? Yes. Again, from my perspective, it’s not a surprise, but it’s really encouraging to understand or to see that the market is ready for automation. What’s your take?

Malcolm: Yes. Again, this is a little bit surprising to me, but in a really good way. It shows a maturity of thinking out there that’s really good. I mean, 82%? That’s high. That’s a heck of a lot of demand for automation. I realize that there’ll always be cases where it’s not so important, but 82% is high. That represents a definite need. People have been clearly thinking about it, and even, absolutely, as almost 50%. It’s like if you were just proposing manual approaches, that’s not going to work anymore. People have realized that which is good.

It looks like the distinguished audience we have today– and I’m sure they’re representative of the overall community, is really thinking about this in a very mature way, and there’s just a hunger for automation that is going to have to be met.

Amnon: Great. Thanks for sharing your view on the poll. We’re right on time with our 45 minutes that we wanted to set for ourselves. I’m going to address some of the questions. If you have any questions either to Malcolm or myself, feel free to jump over to the Q&A and put up your questions. I see a couple of them here. Here is a question to you, Malcolm, “Would business analysts as well as BI developers not be interested in data discovery and data lineage? Do you see any reason for that?”

Malcolm: Maybe it’s just reflective of my experience when I put down that I would agree that, why not? Well, let’s just go through the question again, I want to make sure I absolutely got it. It’s in the business analyst–?

Amnon: “Would business analysts as well as BI developers not be interested in this–?”

Malcolm: I think what I’ve seen, that business analysts, to me– and again, I’m just going off in my experience, have tended to be less interested in data discovery, data lineage.

I’ve had some difficulty in getting business analysts to think in data terms. They’re much more thinking about– or a lot of them are in the automation world, they don’t really do source data analysis. That could just be reflective of personal experience, the way that I prepared that slide for you, but I think business analysts ought to be interested in data discovery and data lineage for sure. If they want a job in the future, they’d better get into it.

Amnon: Here’s another one for the both of us, “Do you have any real-world numbers from client organizations as to the benefit automation can deliver?” I can share our experience. We provide automation for the analysis of your entire BI. There are some really good quotes and numbers on our website under “use cases,” but if I can give some examples. We have organizations that said that the fact that you can automatically extract metadata from a variety of different tools, centralize them, then run an analysis, and then generate the lineage in three seconds to a certain level of complexity– some had testified that we saved them anything from a few days of work to a few months of work.

The reason is that the more complexity you have in your different relationship within the lineage, it means that, how many connections do you have in the data journey? And you’ve seen one lineage in Malcolm’s example, some said that for every dollar they spent on Octopai, the ROI was that they saved anything between $10 to $20 given the fact that they have used automation versus manual.

I can give you another example. We were involved in a project, one of our customers moved away from OBIEE on-prem to Tableau in the cloud. It was a year-and-a-half project. After they’ve done a calculation, they provided us with the info that, not only have they met the project on time, they were able to save almost $400,000 in terms of how many cycles were budgeted, required in order to make sure that they can always find how the data movement process works rather than just click a button and get the mapping of the data journey, and also compare this with what had been changed, and to understand, with the click of a button, what were the differences.

The automation part provides you the ability to always sync the metadata at the latest version, analyze it, click a button, and get a result. Do you want to add something about this, Malcolm?

Malcolm: Yes, I’ll try and give a roundabout answer. I haven’t got the exact stats that you have, Amnon, and I’ll be a little bit controversial here. Why not? If we think about data-centric projects, like master data management, data warehouses, things like that, they tend to have a very high failure rate. Could be 70% or higher, okay? Now, you can argue about that, and people will, but that’s my viewpoint.

We take that, and then, you look into projects where you have thinking still going back to the 1960s of the systems development lifecycle, the Waterfall, even if it’s done as Agile where every drop of the waterfall has its own waterfall, people still think, “Oh, there’s an analysis phase in my project and I’ll locate 20%, maybe no more than that to it.” They put that into their project and it’s all going to be done manually, by source data analysis, by people.

Then when the time is up, you’ve expended your 20%, “Okay, we’ll move on to the next phase of the project now.” You don’t do enough because it’s not automated and that amount of time is never going to be enough, so what happens? The projects proceeds, you get to user-acceptance testing, the stuff that comes out of that point is all wrong, and your project goes into this bucket of 70% to 80% failures, okay?

I think in those terms, if you don’t have automation for that analysis phase, you’re going to run an extremely high risk of not delivering your project.

Amnon: Yes, not delivering your project, maybe not on time, maybe over budget. We have time for maybe another one or two questions. Here’s a question for me, “Is there an on-premise version of Octopai as well as cloud SAS?” The answer’s no. In the beginning of our journey as a company, we had a solution for on-premise and we figured out that it’s not the best way to deliver the solution. We develop our product with great features almost on a weekly or bi-weekly basis, and the inability to manage these kinds of solutions on-prem and the expensive it becomes for the organization, we felt that it’s not the right way.

We also believe that cloud is, for some organizations, the current way to really go to the next level. Also, from our perspective, since we analyze metadata and not data, we haven’t seen an organization that actually declined the cloud solution. Yes, we had to go through some kind of security evaluations, but if you look at octopai.com and you scroll down, you can see banks, insurance, healthcare, pharma, telecom, manufacturing, universities, a whole bunch of different industries that suffer from the same inability to address their use cases, and the fact that these are on metadata makes them feel comfortable that they can upload this and run this in the cloud.

Some other questions, maybe the last question about automation? “How do you start doing this? How do you convey that to the DevOps? And what does it mean for them to start using automation?”

Malcolm: Do you want me to try this one, Amnon?

Amnon: Please.

Malcolm: I’m sat there is DevOps and I’m confronted, all of a sudden, with a problem. Something broke, something went wrong, and I’ve got to jump and figure out the answer. If I have automation, I stand a good chance of figuring out first off, if I have a broken process, what’s downstream of it that’s going to be infected? I want to be able to tell my users that something is broken before they find out for themselves.

To me, that’s super important. Then I can sit there and say, “An ETL process broke or a job broke, let me look upstream to see what’s going on and where I can possibly localize the problem. Let me talk to those folks to see if maybe there was something that changed in the data or something went on that I haven’t informed about, and that’s what broke my ETL job.”

That’s, I think, examples of how DevOps can benefit from this. In terms of change control, I would say when you’re documenting something as part of your change control, have lineage, have the BI glossary, have whatever it is that can be automatically generated as part of that package that goes into the change control because, again, if you’re relying on people to do this, it takes them forever, they make mistakes, and they take shortcuts, and they’re not too interested in doing post-facto documentation, so it can relieve that burden, too.

Amnon: Just to give you an example that we’ve seen some of our clients actually using. If I’m an ETL developer or I manage my ETL, and I come to work in the morning and there’s two ETL processes that fail, it would be great not to guess which other reports for which users are either down or not showing proper numbers before I start getting phone calls from the support of business users complaining about something wrong with a report.

For that, I need to know that if an ETL process is down, I want to run an automated analysis from that ETL upstream to understand, automatically, through lineage, which are the reports that are dependent on that specific ETL, and not to guess.

Yes, I think that we’re right on the hour. We have a couple of more questions, but we do have your contact so I’m going to make sure that we’re going to address them on a personal level. If you have any questions or you’re interested in touching base with us, here’s my personal email. I’ll be more than happy to address any questions or you would like to see more either with us or with Malcolm. I really would like to thank you, Malcolm, for joining today and sharing your experiences and your knowledge. Very, very valuable. Thank you.

Malcolm: Thank you for inviting me, Amnon.

Amnon: I would like to thank everyone who joined today. From our side of the world, keep safe, stay safe, stay healthy. Have a great rest of the week and thank you for joining today.

Malcolm: Thanks, guys. Bye.

Video Transcript

Amnon Drori: Great. I guess we can start. Malcolm, can you hear me?

Malcolm Chisholm: I can, Amnon. Can you hear me?

Amnon: Yes. We’re ready to go.

Malcolm: We’re good.

Amnon: Great. Thank you, everyone, for choosing to be with us today, and thank you for joining. My name is Amnon Drori, I’m co-founder of Octopai. Joining me today is Malcolm Chisholm, President of Data Millennium. The reason we wanted to have this webinar today is, during our journey in the past couple of years talking to hundreds of BI professionals, it looks that there’s one topic that we hear again and again, and again that is still unclear and creates some confusion regarding the topic we’re going to talk today, about data dictionary, business glossary, data catalog, BI catalog, and all these terms.

Given our experience and talking to, really, hundreds of BI professionals, we thought that it would be valuable to spend the time today with Malcolm so he can share his view, from his experience, trying to shed some light about what he thinks are the differences between each and every product, and what exactly is the time that you really know that you need all of them, one of them, or some of them? Malcolm, we can start.

Malcolm: Thanks, Amnon.

Amnon: Sure. Can you see my screen?

Malcolm: Yes.

Amnon: I just want to mention that we have a very dedicated whitepaper talking about this entire topic. I invite you, everyone, to go to octopai.com and download this whitepaper. What we would like to cover in the next 30, 35 minutes, is highlights of this whitepaper that Malcolm wrote for us. Let’s start with everyone’s participating in some kind of exercise to start the morning, to start the session. The topic of today is trying to understand the differences between data dictionary, business glossary, and data catalog, and when you need them.

It would be really cool if you can participate in a poll, and Melissa’s going to pop it right now, is, could you share your experience about do you really understand the differences between all of them? Or kind of understand the differences between all these three tools or you’re lost? Or from your perspective of what you know, they’re all the same? Take a couple of seconds to answer this. We have another two polls down the road of the session, and we’re going to share all the results at the end of the session so you can see what your peers have answered.

Melissa, we can move forward. Malcom, why now?

Malcolm: Why are people asking this big question now about these different kinds of tools? Because now is different. If we go to the next slide, we can illustrate it a little bit. We’re living, essentially, in the golden age of data, but it wasn’t like this. Decades ago, computerization came in and the idea was to automate what, in those days, were manual processes. Today, we’re looking at process optimization, but the emphasis for several decades was really on automation. Data was considered like exhaust that came out the back end of applications. It was a by-product. People weren’t really interested in it all that much.

There was a need for BI activities to support automation, operation reporting, things like that, and, again, because there was no emphasis on data, the volumes were relatively small and it was chiefly internal to the enterprise. Well, if we fast forward to today, beginning in the 1990s but picking up in the 2000s, it’s all changed. We’re in the golden age of data. What has happened is that data’s increasingly important, it’s at the heart of business models.

You can think of very successful companies like Facebook, Amazon, et cetera, all about the data. Then BI has grown and evolved. I think I would credit Amnon with his insight into the social evolution of BI, that it is now a domain. It’s not just a team that supports the final step of data visualization. First off, it’s diffused throughout the enterprise because everybody needs insights.

Also, its tentacles, as it were, reach out into that ecosystem of where data is being produced, transported, and transformed, and as it winds its way to that final step, which is still important, data visualization, the volumes have increased. That they’re massive today compared to what they were just a while back, and they’re growing. The complexity is growing because the business is demanding more and more from BI. It’s a virtuous circle, from a BI professional standpoint, of we need more and more BI to unlock more and more value from the data, and the data’s more complex, and there’s more of it.

With this gigantic swirl, you’re going to need tooling to help you. You’re going to need to answer these questions about, “I just can’t sit here and do this all in some ad hoc, manual fashion like I could decades ago. I do need some kind of tooling support.” The questions are coming to the fore. “Okay, what is it? Is it a business glossary? Is it a data dictionary? Is it catalog? What is it? I need it, and I need it now. I understand I didn’t need it before, but today is different.” That’s, I think, how we’ve got to where we are today.

Amnon: Just from our perspective, and correct me if I’m wrong, when we refer to BI, we’re not only talking about what is known as the “BI tool,” which is the reporting tool. When we refer to BI, we basically refer to the entire landscape that, under the management of great BI professionals, includes ETLs, SQL scripts, data warehouse, databases, analysis services, and reporting tools from different vendors. The collection of these tools together is what we call the “BI landscape” or the “BI professional,” right?

Malcolm: Right, absolutely. That was what I was referring to, about the tentacles reaching out way back from that final step of the visualization layer into, frankly, what’s been an organic growth adding to this complexity of this architecture. It’s all of that today, it’s absolutely not just this final– the last mile, as it were.

Amnon: You talk, also, in the paper, about the BI landscape and the BI intelligence that could be derived from understanding the entire landscape. What exactly do you mean about “BI intelligence”?

Malcolm: What’s intelligence? It’s actionable knowledge that you have available right now to do something with. As a BI professional, you don’t want to receive a requirement or a maintenance request, or a user reporting a problem, and then that’s the starting point for some journey into the unknown of exploration to find all this stuff out because, based on what we’ve just said, that’s not going to scale today. There’s too much data, too much complexity, too much demand from the business. You’ve got to have at hand, today, the knowledge that you can use to perform your work as a BI professional.

You just don’t have this luxury of being able to do this manually anymore. It has to actually be automated, and while I’m sure we’ll come to that, but it’s also, you can’t rely on friends and family, tribal knowledge. You’ve got to have this knowledge about the data, about everything you were saying, Amnon. Where is it out there? What platforms is it in? How can I get it? How is it coming to me? How is it being transformed? What am I looking at? What does it mean? What are the known problems with it? All of this has to be available to you. If you don’t have that, you’re just not going to be able to keep up with what the business is now demanding.

Amnon: In order to be more granular in understanding the differences, can you lead us to understand better, from your perspective, what are the differences between the business glossary, dictionary, and catalog?

Malcolm: I think if we go to the next slide, we’ve gotten it called out. Now, I’ll talk a little bit about the stakeholder communities, too. If you’ve got a business glossary, what’s a business glossary? It is a way in which you can trap business meaning and facts of business significance about business concepts.

They don’t actually always have to be data elements, which will end up as columns in databases, or entities that will end up as tables and databases. They can be about just abstract business concepts and we can describe them. All of that is in the business glossary. I think that part of the trick with the business glossary is to get the content really well-managed, to make it useful, actionable. We can’t think of it in just the terms of a dictionary we’ve seen in school with a tiny little definition, “Oh, and we’re done,” and that’s it. It’s much more than that today. For a BI professional, it has to be.

We’re going to use it, for instance, to understand what exactly the user is talking about when they give us some kind of requirement because the requirements might be quite vague. We want to check them against something where we have the true meaning of what these terms are, and the terminology, too, is in the business glossary, and that often gets lost.

What are the right terms to use?

They’re going to end up as labels in dashboards, reports, and screens. What one should I be using to convey the right meaning to the users? And particularly in areas like metrics where you want to have, not just the definitions, but the methodologies about how this data is collected, that’s going to be important, too, for the interpretation leading to the insights of what we supply as BI professionals.

For instance, one metric I came across a while back was traffic going into sales centers. Well, how’s that measured? Is it everybody who walks in? Is it just people who’ve left their names for follow-ups? Was it families? Was it pets? What was it? All that goes into a business glossary. The BI catalog has a large element of looking at that data visualization, that reporting layer, and capturing what’s there in terms of, what reports do we have? What’s on them? What’s included in them? What transformations happen in them?

There’s going to be more to that, so we’ll come back to that in a minute, but that’s also important for, again, as BI professionals, we want to know if I’m asked to develop something, does it exist already? And, quite frankly, another part of BI is the administrative aspect of it. We don’t want to keep perpetuating reports that no longer have a use. That part of it is also useful in a BI catalog but, again, I’m going to come back to that again in a moment.

The data dictionary is the physical data store assets that we have out there, which would be typically schemas, tables, columns, their technical characteristics, like their data types if they’re null, and so on and so forth. That is our entire data landscape in its physical, technical implementation sense, and that can be an awful lot of information. Data discovery is capabilities on top of that that allow us to find data, to find things within that dictionary, and the dictionary might, today, also have additional profiling information about data in it as well. Not just the static, technical aspects, but more information about the kinds of values, ranges, numbers of discrete values, things like that as well.

Now we’ve got data discovery, very useful to a BI professional who’s looking for data, and we want to help BI professionals to get to the right source of data, the best source of data with the most confidence, in the shortest time possible, so that supplies that. Then, finally, we have the big, very difficult nut to crack, which is data lineage. How is the data flowing across all of this typically very large landscape? And how’s it going across platforms through, again, what you referred to earlier, Amnon, like SQL scripts, store procedures, ETL tools, and so on?

What goes on in them? How is the data changed through transformation in that? Then I think the dictionary, the data discovery, and the data lineage part also come back into the BI catalog. We want to marry all that together with that information about the data visualization layer. Why? In part, because that cements or joins the technical perspective to the business perspective because in reporting, that’s where data itself becomes information, it becomes business understandable. It’s the reports are so important for bridging that chasm between business understanding and technical implementation.

I think that’s a very rough viewpoint into what they are. In terms of the stakeholders, BI developers are going to be interested in all of these. Business users are more in the business understandable areas of the BI catalog and the glossary. DBAs are more in the technical areas, and these tools are very useful to them, too.

Business analysts– and they include data analysts today as well, span a little a bit of all of this in terms of doing things like source data analysis, and data governance analysts who are more about managing the resources and making sure everybody is fulfilling their roles, they’re also interested in another subset of them. I think that’s my take on what these things are.

Amnon: If there’s one takeaway that I can derive from your answer is that, in order to provide, really, a corporate solution or try to orchestrate all the different entities that are responsible for the data movement process from the data sources to the data consumers and to govern them, you probably want to have, as an organization, all of them, but synced together so you can have a really coherent understanding of the data journey, not only in the form of lineage but to understand the definitions.

That when a business analyst sees a field called “customer,” it’s you can track this all the way to its real meaning, all the way back to the data sources. That’s something that we’ve seen, also, repeatedly happening within organizations, trying to understand, to figure out which tool to be used for which user, and now if you combine them all, you can provide a corporate solution.

Malcolm: Right, and I think that’s super important. The scary thing, to me, is if you have these stakeholder groups acting independently and they get just a collection of independent tools, and they don’t sync, as you were saying, Amnon, they don’t sync, you’re going to not get the value that you need out of them. You’re, actually, probably going to have all kinds of problems where you try to have, who knows, manual interfaces or something like that between them. It’s not going to be happy.

Amnon: Focusing a little bit about business glossaries, what are the reasons that you see are pros and cons to actually consider that?

Malcolm: The pros are that the business can interact with the business world. This is their world, that it’s got the meaning that they want. I think that it’s important to have content in there, and this slide illustrates one of the cons. Let me go through that then come back to the pros. You can take a business glossary as a tool, you can go implement it and throw it out there, and say, “Hey, everybody, we’ve got a business glossary, go fill it up with content.” People are very busy, and that’s a big problem, and so they’re not going to find it useful right away.

Why? Because there’s nothing in it, and then they’re going to be asked to put stuff into it. It’s like, “Okay, and how long’s that going to take?” Quite a long time. You’ve got to have some way to have a business glossary with some degree of provisioning of it, maybe it’s automated so that it gets up to a critical mass where it does become useful to the business stakeholders, and after that, is going to be in a steady state because business terms will change, but the rate of change isn’t that high, and it’s going to be also about adding more and more information to it.

This is the big thing you need to think about when you’re having your business glossary, it’s all about the content. Is it sufficiently there in terms of quantity? Is it good in terms of quality? Then the quality aspect of it is, again, don’t just think in terms of definitions, put in all facts of business significance about the concepts that you’re dealing with, most of which are going to be data elements.

Things not just like the definition, whether it’s personal information, whether it’s payment card, industry information, but also these methodological things about how we’re collecting this particular data item.

If there’s rules like, oh, we have “customer first name” and “customer last name,” but when a person only has one name, we put a slash in “customer first name” and put that one name in “customer last name.” All these usage facts, they’re super important for the BI professional.

Amnon: Let’s move forward with that. You talked, in the previous slide, about the different tools for the different stakeholders, and you mentioned, specifically, about the importance of connecting the understanding about the data, like data dictionary and data catalog, to the data lineage. Can you share a little bit more than that?

Malcolm: Business glossary is great at semantics, we understand what things are, but that doesn’t tell us everything we need to know, unfortunately, by far. One of the other aspects we have to think about is what I call and is called “data universes,” which is the populations of data that we’re using, that we’re working with. Here, we have an example of a pretty simple flow. We have, at the far end, a global listing of all employees in the human resources department.

That’s great. That’s coming out of a global combined employee database, but let’s say we have somebody new in HR and they’re given this, and they’ve got to work with it, and they don’t really know what it’s about. Well, they can go to the data dictionary and look at what an employee is and maybe there’ll be some documentation of the different kinds of contracts that exist and so on and so forth, but they don’t know, still, what’s in it.

If you have data lineage, as we see here, you can trace backwards. You can find that, “Oh, this data is coming from a Canadian employee database and is also coming from a US employee database.” I have a global universe of data made up of a universe of Canadian employees and US employees. Now I know what data I’m working with. Yes, I knew the definition, but without the lineage, I’m not going to know the populations of data in the ultimate dataset that are being pulled into my data visualization layer. I’ve got to have the data lineage to really understand that, there’s no other way.

Amnon: What you’re saying is that where, so far, different tools were used by different people that were silos within the organization, like the data analysts were looking at the reports, and there’s the ETL developers, and the DBAs– What you’re saying is that the word is coming to a more coherent understanding, you need to be able to sync between the tools in order to provide, at the end of the day, one story about what this data means and how you track its source through the different transformations?

Malcolm: Right, this is BI intelligence. You’ve got to know everything that you need to know about the data. The business glossary will tell you meanings, but this part of data lineage serves this purpose to tell you what data you have in terms of the populations, the universes of data. Then each individual has his own concerns, like the data analyst, the DBA, the BI developer, et cetera. You’ve got to have this unified view across everything.

That’s why the tools we’ve been discussing really have to come together, again, be synchronized, like you said. Otherwise, it’s just not enough. One of them isn’t going to do it by itself.

Amnon: Now let’s move to the relationship between the BI catalog, and how does it relate to business glossary?

Malcolm: I think this is, again, where automation is going to come in, because we’re looking, as we just did, at technical flows across the data landscape from the US and Canada to a global report. Okay, that’s fine, but we’re ultimately going to be starting off with a technical layer, or a set of technical layers, like a database where this data resides. I know it’s much more complex, again, as you’ve highlighted, but it’s got to get into a report.

We’ve got data, we’ve got a report, ultimately, here, and we’ve got, “Okay, how do we make sense of that?” Well, the business glossary, through the BI catalog, can take the technical information and put it into a view which is business-understandable. We can see, “Okay, we’ve got “customer first name,” it’s got a synonym of “first name.” It’s in a report, “such and such customer profile” or “customer daily tracking.” It’s coming from this database column, it’s coming from this table.”

You can see that the business glossary, in conjunction with the automation provided by the BI catalog, is fusing together the technical information with the business understanding so that everyone could understand what’s going on. You’ve achieved BI intelligence, not just in some technical way, but even toward the extent that it’s user-understandable.

Amnon: If I’m an organization that I, now, do understand a little bit about, why would I need the data catalog, the BI catalog, the business glossary, the data dictionary, and the lineage? And I want to have it, so I guess, at least from what we’ve seen, there’s a lot of work that needs to be put in in order to achieve that.

Malcolm: I got lots of scars from this personally, and I’ve been involved in more projects than I want to remember where we tried top-down projects with manual work and combing through this, and asking the users– These can be successful in a very narrow scope and for solving an immediate problem, but not for BI intelligence. You can’t do that, you need automation. That, again, the scale– yes, and you can see it here.

These are the kinds of results that you’re going to get at the level that matters for BI intelligence. This is showing you a flow through various platforms and ultimately ending up in a report or a dashboard. It is that the complexity has been built up over decades. It just is what it is, and you’re not going to be able to find it without automation. You can try, but a couple of things happen. First off, humans make mistakes in documentation. Even if you’re doing it proactively as a developer, you can still make mistakes.

Also, the scale is just so vast that when you’ve done a little bit of it, you move on to the next bit, chances are the first bit that you’ve touched is going to change and it’s going to be out of date. You’ll always be chasing your tail, you’ll never have anything that you trusted. Then, on top of that, you’ve got complexity. You’ve got problems with going across platforms. It might be Oracle in one place, SQL Server in another, DataStage, SSIS, Informatica ETL, and SQL scripts, stored procedures that are hand-coded.

Do you want to have an expert in each of these come in and play in a manual project? They don’t have enough time. They’re doing better things than trying to do this kind of documentation. That’s before we even get to the reporting layer where there’s all kinds, again, of a whole set of different tools that are used, different technologies and transformations. Other complex events may happen inside of the reporting layer itself.

It’s if you don’t automate this, you’re not going to get anything like the level and reliability of the information you need for what we started out with, which is BI intelligence. Actionable knowledge that you can have confidence in to do your work as a BI professional.

Amnon: I can share that in the past two years, a lot of companies have done a really good job educating the market about the importance of data catalog and data dictionary, business glossary, and trying to explain what each one of them is doing and the value and benefits of using that.

I think that the next step that we see as a barrier for these organizations that do choose to adopt one or some of these tools is how to get that up and running, and the manual work that needs to be in place, and the cost of IT that needs to be put in just to enjoy these kinds of capabilities sometimes becomes the bottleneck and the barrier of adopting these kinds of tools. While you convince organizations about the importance of using these kinds of capabilities, you need to be able to ease the cost and the effort for them to enjoy, finally, those kinds of capabilities that you worked so hard to convince to need.

I think that by triggering organizations to want these capabilities, but not enabling them the easiness of getting that is a sure blocker. We’ve seen automation as a natural thing that organizations talk, every day, “I like your functionality, but what would it take for me to start using it or enjoying it?” People want to get rid of these kinds of manual works, they want to have a magical button saying, “I’m closing my eyes. I’m pressing a button. Boom, it’s there.” I couldn’t agree more, I think that we see that almost every day. Now let me put you in a corner. You’re talking about the importance of each and every one of these tools, and you explained, very nicely, why we need all of them, but if you were an organization that needs to choose one, what is your recommendation? Which one should I choose?

Malcolm: If I had to– if you’ve put me on the spot, which you’re doing– if I could only get one capability, it would be the catalog because I think that’s going to give me that view into my BI assets. Frankly, I think, also the modern catalogs do have lineage capabilities, at least to some extent. If I had to give up everything else and just go for one, it would be that, but I would like, obviously, a tool that does them all and does them in an automated fashion. Based on what you were just saying, I’d also like it to be easy to implement, like it was in the cloud or something, rather than having to mess around on-prem too much.

That is a tough question to answer, and I do get it, that it’s a serious question for many organizations, but I think the one that’s going to give you the biggest bang for your buck, even though it’s not the holistic solution, is going to be the catalog.

Amnon: You mentioned, earlier, about each one of these tools and the collaboration between those tools, and you shared your view. I would like to address the audience for a second. I want to bring up the second poll out of the three that we’re going to use today. From your experience and what you’ve heard today, and Melissa can pop up the poll, how much do you believe the data dictionary, business glossary, catalogs, and data lineage should all collaborate together? Do you think that they absolutely must collaborate in order to provide a really coherent, 360 view of the data journey and data? Meaning, you think it is important, but it’s not a must?

Do you think it would be nice to have? Or you don’t see any reason for them to collaborate and we’re in a silo-based mode? Take a few seconds to answer this and we can move forward. I don’t know how much everybody knows Octopai. We’ve been around for the past four years. After working with BI as leaders in BI organizations in telecom, insurance, banking, and healthcare, we very much believe that BI is not just a team or a group, it’s a domain that had grew its importance within the entire organization, and basically connects the data sources to the data consumers by running this through the BI.

There are so many different challenges there. When you combine the use cases that we as BI professionals run through every day– from day-to-day operation to understand if the report is showing accurate data, all the way to running a project of moving an on-prem reporting tool to a cloud-based reporting tool that can take two to three years’ project, and anything in-between, we believe that you should be able to really get intelligence about what’s going on in your BI. The derivative of this theme creates different tools that enable you to really and properly manage your BI, anything from the operational side to things you don’t know that you don’t know about your BI.

To illustrate that, what we present in our theme is the ability to consolidate different BI tools into a central place, and understand the information about the data from these kinds of tools. You can see a whole bunch of them representing a collection of tools that are responsible for shipping the data from the data sources to the data consumers, but also, very notably, you can see different tools of different vendors. Now, it’s become a salad of different tools that are growing by their number and it just becomes more and more difficult to do your job. We leveraged the cloud, and it’s a power and elasticity to really create a crowd-wisdom or “cloud wisdom.”

We have analyzed, so far, thousands of different BI systems in order to provide the analysis for you. By the way, it takes about 30 minutes or 45 minutes of your time just to extract, upload, and then enable us to analyze that. I think that the power of analyzing cross-platform metadata from so many different tools provides you to enjoy the analyzed metadata in different shapes of tools– some of them, we talked, today, and some of them are not within this solution at Octopai, and collaborate, for example, with external or other vendors’ data catalog or data governance, and so on and so forth.

You can really deal with this list of use cases that we’ve seen very much represent the users that end up spending their time looking, investigating their BI landscape almost on a daily basis. I think that what we’ve seen as a very important vehicle to provide all of that is to think differently or to challenge the status quo by analyzing the entire cross-platform, and you don’t need to do it yourself. Very easy to start. Malcolm talked about it, about everybody is intrigued to use new tools, but then the cost of getting them up and running is very painful, to a certain extent, and you want to make their life easy.

There’s no point of triggering organizations to use functionality but they need to work so hard just to get it. You want to make it easy and simple. I’m going to jump into the last question or to the last poll that I would like to share with you.

From what you’ve heard today of the complexity and what you go through on an everyday basis, looking forward to what your organization would need from the BI, do you think that automation is the key to drive your organization from BI operation to BI intelligence, to the next level? Do you think that you can still continue to do things manually or somewhat more manually, or you think that automation is the game-changer to cross the chasm and take you to the next step?

Do you think it’s a must? Do you think, to a certain extent, it is important? It is an option, but it’s not a must, or you really disagree that automation has no room in our life in this level of complexity? Take another few seconds to answer this.

[silence]

With that third poll, I would like to move to the results. Just a quick reminder that, again, don’t forget to download the whitepaper that talks more in detail about what we tried to share with you today, and I’m going to ask Melissa to share the results if possible? Great. Just a quick reminder. The first question, can you see the results? Malcolm, can you see it?

Malcolm: Yes, absolutely. Yes, they’re up.

Amnon: How strongly do you feel that you really understand the differences between all of the tools that we talked today? Well, you can see that more than 70% are towards “I somewhat understand” or “I’m really lost”. It’s a good practice or the topic of today seems to be on point. Malcolm, do you want to add something about this?

Malcolm: Yes, that’s not what I expected, to be honest with you, Amnon. There’s been a lot of marketing buzz out there about all of these tools, and I think that it’s not serving the market well, so we do need to find ways to clarify what these differences are. Maybe this is also the thought leaders and the consulting community as well needs to, I think, do a better job of highlighting just what the differences are and what their significance is. That is a bit of a shocker to me, actually.

Amnon: On my side, I suspected that the topic that we try to help educate the market today represents what we see. That a lot of organizations still don’t really understand the differences between the tools, and, from one hand, since they don’t understand, they’re hesitant of taking decisions which tool to buy, but it’s also our responsibility as vendors to try to clarify this and try to help some kind of an understanding about which tool does what so we can help our customers to make the right decision. Melissa, can we go to the next poll?

The next one, the second one out of the three was, “How much do you believe the data dictionary, business glossary, and the rest of the tools should collaborate?” Well, they must collaborate 50%, and it is important for them to collaborate. Almost 92% said that these tools should collaborate, and probably, it represents the concept of, “No more silos within the organization. We need to have a coherent understanding about the data movement process for all the different stakeholders throughout the organization.”

Malcolm: Yes, I think people are scared of the silos. These have happened in the past as well, metadata repositories that have been siloed, like individual application data dictionaries and things like that, and it’s not good. It’s great to see, very encouraging to see that everybody wants them to collaborate. I think that is more than just the silo elimination.

These tools are qualitatively different like we’ve said, and people want the synergy to be there. That the whole of what they give you, BI intelligence, is going to be greater than the sum of their parts. I think that’s reflective of what we’re seeing here because it’s just so high. 92% is high.

Amnon: Yes, it’s amazing. I think that what’s encouraging in that, that people understood that moving away from silo to cross-platform or collaboration is the way to make sure that the data is moving properly, but also can be trusted all the way from its technical term, physical term, and business term, at the end of the day.

Moving on to the last one, do you think automation is important or is a key to take your BI operation to the next level? Yes. Again, from my perspective, it’s not a surprise, but it’s really encouraging to understand or to see that the market is ready for automation. What’s your take?

Malcolm: Yes. Again, this is a little bit surprising to me, but in a really good way. It shows a maturity of thinking out there that’s really good. I mean, 82%? That’s high. That’s a heck of a lot of demand for automation. I realize that there’ll always be cases where it’s not so important, but 82% is high. That represents a definite need. People have been clearly thinking about it, and even, absolutely, as almost 50%. It’s like if you were just proposing manual approaches, that’s not going to work anymore. People have realized that which is good.

It looks like the distinguished audience we have today– and I’m sure they’re representative of the overall community, is really thinking about this in a very mature way, and there’s just a hunger for automation that is going to have to be met.

Amnon: Great. Thanks for sharing your view on the poll. We’re right on time with our 45 minutes that we wanted to set for ourselves. I’m going to address some of the questions. If you have any questions either to Malcolm or myself, feel free to jump over to the Q&A and put up your questions. I see a couple of them here. Here is a question to you, Malcolm, “Would business analysts as well as BI developers not be interested in data discovery and data lineage? Do you see any reason for that?”

Malcolm: Maybe it’s just reflective of my experience when I put down that I would agree that, why not? Well, let’s just go through the question again, I want to make sure I absolutely got it. It’s in the business analyst–?

Amnon: “Would business analysts as well as BI developers not be interested in this–?”

Malcolm: I think what I’ve seen, that business analysts, to me– and again, I’m just going off in my experience, have tended to be less interested in data discovery, data lineage.

I’ve had some difficulty in getting business analysts to think in data terms. They’re much more thinking about– or a lot of them are in the automation world, they don’t really do source data analysis. That could just be reflective of personal experience, the way that I prepared that slide for you, but I think business analysts ought to be interested in data discovery and data lineage for sure. If they want a job in the future, they’d better get into it.

Amnon: Here’s another one for the both of us, “Do you have any real-world numbers from client organizations as to the benefit automation can deliver?” I can share our experience. We provide automation for the analysis of your entire BI. There are some really good quotes and numbers on our website under “use cases,” but if I can give some examples. We have organizations that said that the fact that you can automatically extract metadata from a variety of different tools, centralize them, then run an analysis, and then generate the lineage in three seconds to a certain level of complexity– some had testified that we saved them anything from a few days of work to a few months of work.

The reason is that the more complexity you have in your different relationship within the lineage, it means that, how many connections do you have in the data journey? And you’ve seen one lineage in Malcolm’s example, some said that for every dollar they spent on Octopai, the ROI was that they saved anything between $10 to $20 given the fact that they have used automation versus manual.

I can give you another example. We were involved in a project, one of our customers moved away from OBIEE on-prem to Tableau in the cloud. It was a year-and-a-half project. After they’ve done a calculation, they provided us with the info that, not only have they met the project on time, they were able to save almost $400,000 in terms of how many cycles were budgeted, required in order to make sure that they can always find how the data movement process works rather than just click a button and get the mapping of the data journey, and also compare this with what had been changed, and to understand, with the click of a button, what were the differences.

The automation part provides you the ability to always sync the metadata at the latest version, analyze it, click a button, and get a result. Do you want to add something about this, Malcolm?

Malcolm: Yes, I’ll try and give a roundabout answer. I haven’t got the exact stats that you have, Amnon, and I’ll be a little bit controversial here. Why not? If we think about data-centric projects, like master data management, data warehouses, things like that, they tend to have a very high failure rate. Could be 70% or higher, okay? Now, you can argue about that, and people will, but that’s my viewpoint.

We take that, and then, you look into projects where you have thinking still going back to the 1960s of the systems development lifecycle, the Waterfall, even if it’s done as Agile where every drop of the waterfall has its own waterfall, people still think, “Oh, there’s an analysis phase in my project and I’ll locate 20%, maybe no more than that to it.” They put that into their project and it’s all going to be done manually, by source data analysis, by people.

Then when the time is up, you’ve expended your 20%, “Okay, we’ll move on to the next phase of the project now.” You don’t do enough because it’s not automated and that amount of time is never going to be enough, so what happens? The projects proceeds, you get to user-acceptance testing, the stuff that comes out of that point is all wrong, and your project goes into this bucket of 70% to 80% failures, okay?

I think in those terms, if you don’t have automation for that analysis phase, you’re going to run an extremely high risk of not delivering your project.

Amnon: Yes, not delivering your project, maybe not on time, maybe over budget. We have time for maybe another one or two questions. Here’s a question for me, “Is there an on-premise version of Octopai as well as cloud SAS?” The answer’s no. In the beginning of our journey as a company, we had a solution for on-premise and we figured out that it’s not the best way to deliver the solution. We develop our product with great features almost on a weekly or bi-weekly basis, and the inability to manage these kinds of solutions on-prem and the expensive it becomes for the organization, we felt that it’s not the right way.

We also believe that cloud is, for some organizations, the current way to really go to the next level. Also, from our perspective, since we analyze metadata and not data, we haven’t seen an organization that actually declined the cloud solution. Yes, we had to go through some kind of security evaluations, but if you look at octopai.com and you scroll down, you can see banks, insurance, healthcare, pharma, telecom, manufacturing, universities, a whole bunch of different industries that suffer from the same inability to address their use cases, and the fact that these are on metadata makes them feel comfortable that they can upload this and run this in the cloud.

Some other questions, maybe the last question about automation? “How do you start doing this? How do you convey that to the DevOps? And what does it mean for them to start using automation?”

Malcolm: Do you want me to try this one, Amnon?

Amnon: Please.

Malcolm: I’m sat there is DevOps and I’m confronted, all of a sudden, with a problem. Something broke, something went wrong, and I’ve got to jump and figure out the answer. If I have automation, I stand a good chance of figuring out first off, if I have a broken process, what’s downstream of it that’s going to be infected? I want to be able to tell my users that something is broken before they find out for themselves.

To me, that’s super important. Then I can sit there and say, “An ETL process broke or a job broke, let me look upstream to see what’s going on and where I can possibly localize the problem. Let me talk to those folks to see if maybe there was something that changed in the data or something went on that I haven’t informed about, and that’s what broke my ETL job.”

That’s, I think, examples of how DevOps can benefit from this. In terms of change control, I would say when you’re documenting something as part of your change control, have lineage, have the BI glossary, have whatever it is that can be automatically generated as part of that package that goes into the change control because, again, if you’re relying on people to do this, it takes them forever, they make mistakes, and they take shortcuts, and they’re not too interested in doing post-facto documentation, so it can relieve that burden, too.

Amnon: Just to give you an example that we’ve seen some of our clients actually using. If I’m an ETL developer or I manage my ETL, and I come to work in the morning and there’s two ETL processes that fail, it would be great not to guess which other reports for which users are either down or not showing proper numbers before I start getting phone calls from the support of business users complaining about something wrong with a report.

For that, I need to know that if an ETL process is down, I want to run an automated analysis from that ETL upstream to understand, automatically, through lineage, which are the reports that are dependent on that specific ETL, and not to guess.

Yes, I think that we’re right on the hour. We have a couple of more questions, but we do have your contact so I’m going to make sure that we’re going to address them on a personal level. If you have any questions or you’re interested in touching base with us, here’s my personal email. I’ll be more than happy to address any questions or you would like to see more either with us or with Malcolm. I really would like to thank you, Malcolm, for joining today and sharing your experiences and your knowledge. Very, very valuable. Thank you.

Malcolm: Thank you for inviting me, Amnon.

Amnon: I would like to thank everyone who joined today. From our side of the world, keep safe, stay safe, stay healthy. Have a great rest of the week and thank you for joining today.

Malcolm: Thanks, guys. Bye.

Announcement ! We are happy to share that Octopai has been acquired by Cloudera