Data management strategies are constantly changing, and businesses must be prepared to adapt to these changes to stay competitive with information they can trust and have access to when they need it.
What is the data mesh ?
A data mesh is an innovative and still evolving data analytics paradigm, it was designed to transform monolithic architectures, such as data warehouses and data lakes, into a more decentralized architecture.
Why the data mesh ?
The data mesh responds to the challenges in centralized and monolithic data architectures, namely:
1 – A lack of business knowledge within teams responsible for data: teams responsible for centralized data too often have to deal with misunderstood data to solve equally misunderstood business problems. Consequently, many back and forths between the team in charge of the data and the business teams slow down the process and affect the quality of the final results.
2 – A lack of flexibility of centralized data management platforms: centralizing all data on a single platform can be problematic. The needs of large organizations are indeed too heterogeneous to be satisfied by a single platform.
3 – Delays in the supply of data and responses to change requests: each business request requires the integration of data into the centralized architecture and the modification of flows at all levels of the system. This makes the architecture rigid and prone to failure when changes occur.
The goal of the data mesh is to solve these problems by making organizational units (called “domains”) responsible for managing and exposing their own data to the rest of the organization. Domains better understand how their data should be used, which reduces iterations and improves quality before meeting business needs. It also removes the bottleneck of a centralized infrastructure and gives domains the autonomy to use the tools that best suit their own situation.
However, it also introduces obvious risks such as the creation of data silos, duplication of effort across domains, and lack of unified governance. To deal with these risks, the data mesh introduces several additional concepts:
-
Data as a product: the data exposed by the different domains must be easily discoverable, understandable and usable by other units.
-
Self-service data platforms: building and managing a data infrastructure is complex. Not all areas will have appropriate resources, and duplication of effort should be avoided. Domains should be able to use a self-service platform to automate or simplify tasks such as data integration and transformation, security policy enforcement, data lineage, and identity management.
-
Federated IT governance: to guarantee the interaction between the data products created by the different domains, a certain level of standardization is necessary. This includes the semantics of entities common to several domains (for example, customer and product entities) and technical aspects such as addressability of data products and identity management. Some security policies can also be applied globally. Where possible, all of these normalizations and policies should be applied automatically.
Because it guarantees unified data access, data security and a data governance layer on top of distributed and heterogeneous data systems, it is evident that data virtualization is a key technology in the implementation of the data mesh.
Conclusion :
Today’s data platforms are centralized, monolithic, with a highly coupled pipeline architecture, operated by silos of hyper-specialized data engineers.
Innovative new solutions such as Graal Platform have been designed precisely to meet these challenges and provide a unified, governed and secure data platform on top of multiple distributed data systems, to perfectly suit the implementation of the principles of data mesh.