Table of Contents
In a traditional centralized approach to data management, a single team is responsible for managing all the organization’s data. This team is typically composed of data engineers, data scientists, and data analysts, and they are responsible for building data pipelines, managing databases, and creating reports and dashboards.
However, as organizations grow more complex, this centralized approach becomes increasingly difficult to manage. Data requests pile up, bottlenecks form, and it becomes difficult to scale the team to meet demand. That’s where data mesh architecture comes in.
Data mesh proposes a decentralized approach to data management. In this model, data is treated as a product, and each product has its dedicated team responsible for its management. These teams are typically cross-functional and include data engineers, data scientists, and domain experts from the business units that rely on the data. Each team is responsible for building and maintaining its own data pipelines, managing its databases, and creating reports and dashboards.
Understanding the Data Mesh Principles
Data Mesh is a relatively new approach to managing data within large, complex organizations. It was introduced by Zhamak Dehghani, a principal consultant at ThoughtWorks, in a 2019 article published on Martin Fowler’s website. The basic idea behind Data Mesh is that data should be treated as a product, just like software or other tangible products managed by different teams.
The architecture is built around four main threads, each of which plays a crucial role in the management and governance of data within an organization. These four threads are:
Domain-oriented decentralized data ownership
Data ownership is decentralized in data mesh, with each domain owning its data products. This means that each team is responsible for its own data pipelines, databases, and reports and can make changes and updates independently without needing to coordinate with a centralized team.
Data as a product
The concept of reusability lies at the core of this model, allowing data products to align with business needs in any format, be it algorithms, derived data, or dashboards. Each data product has its dedicated team responsible for its management. Each team is responsible for ensuring the data is accurate, consistent, and adheres to best practices and standards.
Self-serve data infrastructure
Domain teams have access to self-serve data infrastructure, which allows them to manage and operate their data products with tools for data discovery, data quality management, data lineage tracking, and data access control. With self-service analytics, they can optimize resource utilization while governing the number of technologies implemented.
While data ownership is decentralized, governance is federated, meaning that each team must adhere to overarching standards and best practices. This includes policies and procedures around data security, privacy, and compliance, as well as technical standards for data quality, architecture, and integration.
Federated governance helps ensure data is managed consistently and effectively across the organization while allowing for the agility and flexibility of a decentralized approach.
Why Should You Know About Data Mesh?
The architecture represents a fundamental shift in how data is managed within large organizations by shedding monolithic architectures in favor of a decentralized model. Giving each data product its dedicated team allows business users to make faster iterations and experiments, allowing quicker and more informed decision-making.
A distributed ecosystem of data products removes complications like hand-crafted processes like data ingestion leading to excessive friction due to dependencies. Long analytical cycles happen when new data sources are added continuously, the workload becomes too large to handle for centralized teams, and the data team operates in a silo, with little interaction with other parts of the organization.
In a data mesh model, each team responsible for a specific data product includes domain experts from the business units that rely on the data. This encourages cross-functional collaboration and a deeper understanding of business needs, which can lead to more meaningful insights via an advanced analytics acceleration platform.
The domain teams are authorized to operationalize the data products as required to gain faster insights. Data users can access customized and enterprise-wide unified views of data across domains, even for massive and complex datasets.
As a game-changing approach to data management, data mesh addresses the challenges posed by ever-increasing data volumes in fast-paced business environments. They need a consolidated effort to create data products from different domains into one layer.