Data discovery is being touted by the recent Dresner Advisory Services report entitled, State of BI 2018, published in Forbes, as one of the top six most critical initiatives strategic to business intelligence (BI), right behind dashboards, reporting, end-user “self-service”, advanced visualization and data warehousing¹. Data discovery is defined as a business user-oriented process for detecting patterns and outliers by visually navigating data or utilizing advanced analytics. Discovery is an iterative process that is searching for the cluster of data that leads to meaningful information that drives strategic focus. When an organization thinks of data discovery, one should think of the journey rather than the destination – it is part and parcel of continuous rapid experimentation and failing fast mindset.
The latest buzz in artificial intelligence suggests that it alone as an approach is the holy grail in competitive advantage in the digital age. The reality is all organizations must leverage many different approaches consistently and often to gain sustainable competitive advantage. With that in mind, rather than focus on tools and approaches, organizations should also look at their organizational data discovery operational efficiencies understanding that over 80% of a BI projects are spent on data pre-processing. Improvements to systemic data pre-processing should correlate directly to improving operational efficiencies and ultimately, making better data-driven decisions. We want many machine learning models, graph databases, visualizations and insights being deployed often and to the best of our abilities.
Executive management is taking a more active stake in BI initiatives from what was traditionally an operations and sales leading role. Reporting and dashboards are how a data-driven, metric-based organization operates day-to-day, but data discovery can provide further contextual nuanced information to be included in a future dashboard in driving better decisions and revenue.
Metadata binds the enterprise information together
Metadata is data about data, or a set of information that describes other data such as data type, data structure and data source. With all the new algorithms being deployed, an organization should align to a common definition of what the data is, its limitations and benefits to even begin the journey of data discovery. Metadata has traditionally not been the actual deliverable, but embedded in integration, quality, data warehousing and operational applications, behind the scenes for the business process it enables². Data-driven organizations must understand that metadata is the common denominator and glue that binds the organization together for the design, operation, management and governance².
Metadata management has many advantages
Broad data sharing is often stated as a top benefit of metadata advantages. On the other side of the coin, poor metadata management is the leading cause of limited data sharing³. As we move to a data-centric organization with multiple silos of ERP, CRM, customer and social media data, we must allow for a holistic view of the customer and business through broad data sharing to get the desired integrated 360° visibility. With an integrated metadata process, you can respond faster to change, reduce costs, reduce complexity and support revenue growth.
Recycle metadata for model deployment and product development
As 80% of data science work is in data preprocessing, a strong metadata management solution can be the impetus to faster model deployment and product development. Organizations can thoroughly define data sources, contextualizing them that represent various user groups to increase speed and agility for other data science and BI work. Tools that have a centralized repository and browser-based UI allow for self-service and ultimately, faster data discovery. The key is the right process and tool to allow for strong metadata management that meets the criteria of all constituencies.
Centralize a “Metadata Warehouse” or Metadata Repository.
Creating a metadata warehouse seems like a daunting task to the BI or data warehouse manager, but the benefits far outweigh the effort. A centralized solution helps define roles and responsibilities to focus on the process, and ultimately enables the reuse of metadata objects across many projects; enable self-service; increased product development as well as faster response times. Metadata Management is used by BI or data warehousing, data integration, data quality, data governance, financial applications, content management, CRM, Customer data integration, Master data management, Data stewardship, ERP, and HR. Some digital natives may argue that is a limited list. With all the various applications, centralization is key to leveraging benefits and fostering collaboration amongst various groups. This is the first step in enabling many of the best practices such as controlled access, data standards, glossaries and data ontologies or data lineage. If data is transformed for many BI outputs, how does it change? Visualizing and seeing that in an easy-to-use process, ongoing, will ultimately yield very strong benefits.
Broadening the Metadata definition for strategic added value.
In an always-on, ever-growing volume of data, organizations are “drowning in data yet starving for information”. As we move to self-service discovery and flat organizations, one requires immediate contextual information for any dataset, report and document. This contextual information is to include not just technical metadata but contextual and caveated explanations for various business intelligence tasks. This eliminates a lot of misunderstanding and misalignments with data definition and understanding clearly the context and data lineage. Visualizing the data journey by adding contextual information will allow quicker and broader use to re-emerge as a data-driven organization.
Metadata management will enable organizations stronger data governance and compliance, but also has a very strong strategic value to innovate, recycle and repurpose data pre-processing for better information, efficiencies, speed and agility.
References:
Columbus, Louis. “The State of Business Intelligence, 2018.” Forbes, Forbes Magazine, 8 June 2018, https://www.forbes.com/sites/louiscolumbus/2018/06/08/the-state-of-business-intelligence-2018/.
Russom, Philip. “TDWI Checklist Report // Cost Justification for Metadata Management.” Transforming Data with Intelligence, TDWI, 16 Aug. 2010, www.tdwi.org/research/2010/08/cost-justification-for-metadata-management.aspx