In the search for developing the best data architecture for an organization’s present and future requirements, there are many options that enterprises can go for. Due to the packaging of the software structure of the software, these options are plenty for organizations to choose from. Enterprises may find it difficult to select the right option, which is why lately there has been the emergence of patterns from the maw, allowing organizations to help them in the journey of data management, which includes data fabrics and data mesh.
In the first instance, both data fabric and database reflect similarity from a conceptual standpoint. Meshes are usually made from fabrics and they can be given different shapes as per the requirement. This allows IT departments to place these meshes on top of other systems, which are continuously in the process of data crunching.
No matter how similar both these approaches look, there are some distinct differences, which can be noticeable only if we delve further into these two approaches.
What is Data Fabric
The first definition of data fabric came in the mid 200s, where Noel Yuhanna, an analyst from Forrester was the first individual to do so. From a concept point of view, data fabric is a metadata-based way of connecting a varied set of data tools. The objective is to address the main pain points in some of the big data projects, not just in a cohesive manner but also operating in a self-service model. There are various capabilities that data fabric solutions deliver, such as data access, discovery, transformation, integration, governance, lineage, and security.
There is a significant pace that has built up in the concept of the data fabric. This is helping to simplify the process of accessing and managing data in a growing heterogeneous environment. A heterogeneous environment comprises transactional and operational data stores, data lakes, data warehouses, and lake houses. We are seeing a growing number of organizations who are developing data silos, and due to cloud computing, the problem pertaining to the diversification of data is getting bigger and bigger.
Having a single data fabric placed on top of the data repositories, an enterprise can package it in the form of unified management to the different sources of data, that includes downstream data consumers such as data scientists, data engineers, and data analysts. However, what is to be noted is that data management is unified and not the actual storage. The actual storage still remains in a distributed model. There are many vendors such as Informatica and Talend that provide data fabric with the capabilities described above.
What is Data Mesh
While data mesh does solve most of the problems that a data fabric does, such as the challenge of managing data in a heterogeneous environment. However, the method of handling and solving this problem is different in a data mesh approach. While data fabric creates a single layer of virtual management on top of the data storage that houses distributed data, the data mesh approach is more about a distributed group of teams that will manage the data as per the requirement despite having some governance protocols.
The concept of data mesh was defined by Zhamak Dehgani. Zhamak is the director of tech incubation at Thoughtworks North America. The fundamental principle that governs the data mesh approach in resolving the incompatibility between data lake and data warehouse. The first-generation data warehouse is designed to store massive quantities of structured data, which is mainly consumed by data analysts.
However, the second-generation data lake is used for storing enormous amounts of unstructured data, which is predominantly used for building predictive machine learning models. In that definition Zhamak has explained about a third-generation data warehouse (known as Kappa), which is all about real-time data flows by adopting cloud services. However, this does not resolve the gap between first- and second-generation systems from a usage point of view.
In the process of ensuring the sync of data, many enterprises develop and maintain an exhaustive ETL data pipeline. As a result, this creates a need for extremely specialized data engineers who have the competency to maintain the working of such systems.
A critical point that Zhamak put forward was around the problem that data transformation cannot be hardwired into the data by engineers. On the contrary, it should be something like a filter that is applied to a common set of data, which is available to all users.
So, instead of developing a complex pipeline of ETL data, the data is stored in its original form. The ownership of the data is taken by a team comprising of domain experts. The architecture of the new data mesh approach explained by Zhamak, consists of the following characteristics:
- Domain-based ownership of decentralized data and architecture
- Data as a product
- Data infrastructure platform is offered in a self-service model
- Federated computational governance
In a nutshell, the data mesh approach identifies that only data lakes possess the flexibility and scalability to handle the analytics requirement.
Data Mesh vs Data Fabric
As we observed above, there are quite a few similarities between data mesh and the data fabric approach. However, let us also look into the differences between the two.
According to Noel Yuhanna, an analyst from Forrester, the major difference between the data mesh and the data fabric approach is the way the APIs are processed.
A Data Mesh is primarily API-based for developers, while data fabric is not. Data fabric is essentially the opposite of data mesh, where the developers will be writing code for the APIs to the interface of the application. Unlike the data mesh, data fabric is a no-code or low-code method, where the API integration is executed in the fabric without leveraging it directly.
According to another analyst, James Serra, who works with Ernst & Young as a big data and data warehousing architect, the difference between data mesh and data fabric is in the type of users who are accessing them.
Data mesh and data fabric both provide access to data across different technologies and platforms. The difference is that data fabric is more technology-centric while data mesh is more dependent on organizational change.
According to an analyst of Eckerson Group, David Wells, an enterprise can use data mesh, data fabric, and even a data hub together. Wells further adds that these two are concepts and are not technically mutually exclusive.
Data fabric products are mainly developed on production usage patterns, whereas data mesh products are designed by business domains. The Discovery of metadata is continuous, and the analysis is an ongoing process in the case of Data Fabric, while in the case of data mesh the metadata operates in a localized business domain and is static in nature.
From a deployment standpoint, data fabric harnesses the current infrastructure facility available, whereas data mesh extrapolates the current infrastructure with new deployments in business domains.
Both data mesh and data fabrics find a place in the boardroom of big data. When it comes to finding the right architecture framework or architecture.
Fuente: