Data is now the soul of every digital business, and the pandemic has accelerated the adoption of Analytics and AI as a business function. Over the past few years, organizations had to rapidly move to new data technologies, modern data architectures, and infrastructure to drive innovations such as personalized product recommendations and predictive analytics. Despite such changes, collection, integration, and governance of data is still the main inhibitor to Analytics and AI success, says Deloitte Research.
The evolution of business insights platforms can be fragmented into three generations, as per Zhamak:
- Organizations deployed traditional data warehouses in the first generation to generate reports as per the need. This was very expensive and lacked a centralized approach.
- As Big data and analytics gained popularity in the second generation, data warehouses were replaced by a central data lake. Though this became very popular, a few bottlenecks like data volumes, scalability, domain-specific data highlighted the need for a decentralized approach.
- The current third-generation platforms address a few gaps and garnered attention towards product thinking with data, self-service platforms design, and distributed domain-driven architecture. All these gave way to a Data Mesh architecture.
Data Mesh vs Data Lake – A Paradigm Shift
Companies no more operate on gut-driven decisions but tend to be data-driven. Data Lake was always at the core for any such organization, providing democratized access to all business functions.
Data mesh vs data lake – Rethinking data architecture
As the data lakes grew, the complexity of data management also changed. In a typical data lake architecture, data producers generate it and send it to the data consumers. In short, data producers are very tech-savvy while consumers are business savvy. Often, data consumers had to go back to data producers to understand the domain and intrinsic value of the data. The centralized data ownership created two main challenges for businesses:
- Most of the data engineering team’s efforts are led towards fixing the issues and revalidating the data. An inherent difficulty in searching and interpreting the data is evident.
- Data users are not aware of the source domain from where the data is extracted often leading to low data quality.
The new architectural concept, data mesh resolves these big data issues by decentralization of data ownership and federated data governance. Data mesh claims that distributed domain-driven architecture can fuel big data innovations and resolve scalability challenges. The approach addresses the challenges with a shift in thinking at four levels:
- At platform level – Data as a product and distributed domain-based data architecture. Data responsibilities are vested to domain owners and data as a product is shared between different domain owners.
- At technical level – Decentralized data ownership and shared data infrastructure with a conceptual separation between data ownership and infrastructure. Cross-function data is offered through APIs for other business needs. Data quality and data products are vested with domain owners.
- At team level – More collaboration from domain owners to bring in new data sources and further develop solutions as per the business needs.
- Expertise – Skills are broader for each team and enables easier shift across different data products
Data Mesh vs Data Lake – Scalability to Drive Insights Faster
Data lakes have democratized access for all the business users but created siloed data and organization structure that does not scale up to deliver the promised value of a data-driven organization. In reality, we find ubiquitous data, disconnected data teams, and little access to consuming domain experts. Data mesh approach powers the next-gen enterprise data platform architecture in convergence of the following:
- Distributed Domain-Driven Architecture – Microservices architecture has transitioned the business to domain-oriented capabilities, but we have disregarded the same with data. In data mesh, domains host and serve the data in a consumable way shifting the approach from centralized data lake architecture. Traditional Push and Ingest thinking are transformed to a Pull and Serve model across all the domains with the data mesh concept.
How does it alter insights?
Cleansing, Preparing and Aggregating of data lies with the domain, and data pipelines are handled within the domain thereby resolving data quality issues. But yes, each domain data set must have a service level objective to ensure quality.
- Data Product Thinking – Over the past few years, operational teams evolved product thinking by creating the best developer experience for their APIs and making them understandable and discoverable. The same applies to data mesh architecture too by making the data product discoverable, addressable, trustworthy, and governance with global standards. Well-described semantics, syntax, centralized meta information, and data provenance and data lineage to aid in avoiding siloes and hyper-specialized ownership.
Do we need data as a product?
The Marketing Manager of an online retailer would struggle to identify unnamed analytics solutions as per their need. But they would be interested to use a data-driven customer engagement platform. A well-defined identifiable data product leads to exceptional results as per the business context and drives decision-making at scale.
- Self-Service Data Infrastructure as a Platform – Hiding all the complexities, keeping the data infrastructure domain agnostic, excluding domain-specific information remains the centerpiece in building data infrastructure as a platform.
Why such a centralized platform?
Though the tools and techniques are not matured in the system, this infrastructure platform resolves duplication of efforts in setting up data pipeline engines, storage, and streaming infrastructure. This entire setup reduces the lead time to create a new data product and drives the automation efforts.
Treating data as a product alters the ownership responsibility, brings in more visibility, and makes it easier to consume the data. Therefore, the Data mesh concept avoids human knowledge siloes to create value and innovation from data for Machine Learning and AI experts.
Do you think data mesh is still a concept or interested in implementing it? A few use cases were mentioned in our previous data mesh blog and now let us look at a few implementations by the industry leaders.
A Few Data Mesh Implementations
Data mesh is not about a specific code or tech stack that solves your problems with the click of a button. Many experts also argue that this approach is suited for large organizations. But the reality is that data is diversified and ubiquitous for businesses of any size and growth for the organization of any size lies in new-age data management solutions. A few industry leaders have already mitigated the risk of siloed data.
Europe’s leading fashion platform Zalando transitioned to a data mesh self-service architecture, all built within the centralized infrastructure layer of AWS data lake. The Insights team could access the needed data and the analytics solutions was scalable as per the business requirements.
Netflix, the online streaming platform processes trillions of events every day. As data integration is core for their business operations, Netflix implemented data mesh architecture to optimize costs, improve performance, and mitigate operational risks.
Data mesh is not specific to any industry, Financial giant JP Morgan implemented data mesh to facilitate data reuse and derive insights faster.
Are you interested? Please connect with us for more information about our data mesh offering.