BACK TO ALL POSTS

Enabling Data Discovery in a Data Mesh: The Saxo Journey

Data Mesh

Data Engineering

Data Discovery

Data Quality

Open Source

Sheetal Pratik

Jul 9, 2021

Data Mesh

Data Engineering

Data Discovery

Data Quality

Open Source

Photo by Ricardo Gomez Angel on Unsplash

Photo by Ricardo Gomez Angel on Unsplash

Background

Saxo Bank connects clients to investment opportunities in global capital markets, delivering user-friendly and personalized multi-asset trading and investment tools to private clients and open banking solutions to wholesale clients. Saxo Bank’s founding ethos is to democratize trading and investment.

Effective use of data is important for Saxo Bank as we aim to optimize business processes, uplift client experience and scale out to meet our incredible rate of growth. Key to this is thinking about data as a product, for which we have identified a number of principles:

Data principles at Saxo: Data as a Product

Data principles at Saxo: Data as a Product

Saxo’s evolving data architecture has Apache Kafka at the core as an authoritative source for data consumers, and an in-house central data management application, “Data Workbench”, powered by LinkedIn DataHub and Great Expectations , enabling domains to publish and manage their data as a product by providing features like data discovery, data ownership, data lineage, metadata, data quality etc. Using this approach, we have provided a coherent way to create and share datasets with quality with light touch governance from the Data Office.

Data Governance Framework

Data Governance Framework

Saxo’s Data Mesh

Since Data Domains are central to our vision, the Data Mesh principles as outlined by Zhamak Dehghani naturally apply to how we want to federate data ownership.

Data Mesh components at Saxo

Data Mesh components at Saxo

As described in the recent InfoQ Trends Report, self-service is a characteristic of data mesh that should equally apply to Data Governance tooling. The previous generation’s approach of having a large team, full of analysts trying to document and reverse engineer the data landscape, became a bottleneck for on-boarding and creation of new data assets. Our approach is to tie the definition of a data product to its published form in Kafka, along with its derived representation in Snowflake. We capture metadata at point of origin when the dataset is being defined which ensures that data has good documentation, aiding discovery. As a central team, we want to enable business teams to attach more metadata to these data products, such as data quality rules and execution results, which increases trust and reliability of data without requiring heavy lifting from the central team. The key objective is to continually lower the barrier of entry to data democratization through self-service and automation. Graham Stirling, Head of Data Platforms at Saxo Bank, describes how domain driven architecture has been key to bring data mesh into reality in his recent blog post: “Saxo Bank’s Best Practices for a Distributed Domain-Driven Architecture Founded on the Data Mesh”.

Current Status and the Road Ahead

We have found LinkedIn DataHub to be a great fit as a modern data catalog for Saxo’s data mesh architecture. Its third-generation extensible architecture is already showing benefits in adoption at scale as we onboard new domains and bring in various aspects of data management into a cohesive unit.

We are seeing productivity enhancements as people are able to discover schemas in the pre-prod environment before they are deployed to production, making them aware of changes before they happen.

Data Products Published on Data Workbench

Data Products Published on Data Workbench

The DataHub project is open-source, and has a vibrant, growing and welcoming community that has helped us reach into a network of data experts to make the project better for us and everyone. Business Glossary was an important item on our roadmap, to allow us to link our attributes to industry standard business terms. One of the major problems in data flows is the inconsistent naming and meaning of data elements. This results in complex mapping exercises, confusion and ultimately data inconsistencies which can have a financial and reputation impact to any organization.

To address this issue, our team proposed and contributed the Business Glossary feature — a capability that was a critical requirement on our roadmap. We have been working with the community on contributing the implementation back for others to benefit. Business terms from industry specific standards like FIBO (Financial Industry Business Ontology) for finance, can complement the data elements, improving semantic understanding and aiding discovery.


We are excited about the upcoming DataHub roadmap , which includes a reimagined search and discovery experience, data observability, and a host of other exciting features, that can help us realize our dream of a one-stop shop for data.

We are collaborating with Acryl Data , LinkedIn and the Datahub open source community to push DataHub into the critical day to day workflows of our data workers. Stay tuned!

Data Mesh

Data Engineering

Data Discovery

Data Quality

Open Source

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

TermsPrivacySecurity
© 2025 Acryl Data