What is Data Mesh?
Data mesh is an approach to data architecture that is intentionally distributed, where data is owned and governed by domain-specific teams who treat the data as a product to be consumed by other domain-specific teams. Unified standards and tools for governance, discoverability and access enable a data mesh to function smoothly as an ecosystem.
What are the principles behind data mesh architecture?
Underlying data mesh principles include:
Different business domains should own and manage their own data
The data mesh approach holds that an organization is made up of different domains, defined according to their business function. Administration, marketing and sales, for example, are all different domains within a business. They will often have overlap and will certainly interact, but each domain has its own perspectives, term definitions and needs, even when it comes to concepts shared by domains, such as “revenue.”
Continuing in that domain-specific appreciation, in the data mesh approach, data operations for each domain should conducted by domain-specific, cross-functional teams. Instead of hyper-specialized data engineers who need to deal with requests from domain-specific data owners, producers, and consumers without actually understanding the context of the data, data engineers should be an integrated part of a domain-specific, cross-functional data team. This makes each domain an independent, effective and efficient unit when it comes to dealing with its data.
Each domain is expected to have a unified intradomain model of its data and definitions, and explicitly define the interrelationships with the other business domains in the organization.
Decentralized operations, centralized standards
While the focus on domain-specific units owning and dealing with their own data is a hallmark of decentralized data management, data mesh brings all these units into one whole through centralized standards.
Decentralized operations with centralized standards is also known as a “federated governance model.”
Data is a product or resource to be delivered
In the data mesh model, each domain-specific unit is responsible for providing its data as a ready-to-use product to all the other business domains. They must therefore:
- Ensure the accuracy and integrity of their data
- Deliver it to everyone who needs it in a timely fashion
What are the benefits of data mesh?
Because the data mesh approach distributes the data engineering burden among domain-specific data engineering teams, it can:
- Avoid misunderstandings that happen when a hyper-specialized data engineering team is disconnected from the business domain it’s being asked to do work for
- Reduce waiting times on IT requests
Since data mesh emphasizes the responsibility of each domain to treat their data as a product for consumption, it can:
- Increase the availability and use of clean, quality data throughout the enterprise
- Move an organization further on the path to data-driven decision-making
Since data mesh is designed as a pattern of self-contained, self-maintained systems, it can:
- make scaling much easier – a simple matter of replicating and connecting another pattern piece to the existing data mesh platform
What are essential data mesh tools?
In order to construct an effective data mesh, each business domain must have tools to ensure the usability and consistent deliverability of its data products. Such tools include:
Data lineage tools
Data lineage lets you examine your data’s journey, from where it originated to its final target and everything that happened to it at every stage in its journey. A comprehensive, automated data lineage solution is important for ensuring data transparency, accuracy and overall quality.
Data lineage is also essential to quickly getting to the root of any issues that crop up, and of investigating where issues might appear should you make a proposed change to a business process.
Data observability tools
Data observability tools let you be on top of what is currently happening within your data pipelines, ideally enabling you to head off issues at the pass with automated features like milestones and alerts.