BACK TO ALL POSTS

Using DataHub for Search & Discovery

Data Science

Big Data

Metadata

Data Engineering

Analytics

Sayak Maity

Jun 8, 2022

Data Science

Big Data

Metadata

Data Engineering

Analytics

Using DataHub for Search & Discovery

In large organizations across all domains, the importance of data is the one constant. Data drives decision-making and generates critical operational insights across organizations. Having access to powerful search & discovery tools helps catalyze the impact that data can have on an organization. DataHub integrates many different data sources together to form the ultimate search and discovery tool that can serve a variety of users across an organization. Today, we’ll take a closer look at how DataHub enhances the workflow of a Business Analytics Lead and a Data Engineer.

Business Analytics Lead

An analytics lead would be tasked with generating insights from the data available. Here’s how DataHub can help them quickly answer questions that they would regularly face while doing their job.

What is the authoritative dataset on a subject?

Using DataHub, we can check monthly queries to get a sense of how popular certain queries are and which datasets are popular. Getting insight into the popularity of datasets, measured in monthly queries, is useful metadata that’s impossible to know without an elegant metadata management system like DataHub.

Get insight into salient dataset usage statistics

Get insight into salient dataset usage statistics

How is a certain KPl calculated?

KPIs are critical for quantifying the impact of your work and measuring if business changes are having their intended impact. With DataHub’s glossary terms, you can save information about how KPIs are calculated, however complicated they may be, and make it easily discoverable for anyone in your organization.

View information on how KPIs are calculated

View information on how KPIs are calculated

Is this dashboard built on reliable sources?

Oftentimes, we’ll have access to dashboards on services like Looker, but Looker won’t give us visibility into how the data has been transformed and combined starting from the raw data. In DataHub, we’re able to view the lineage of where data has been stored and how it was transformed. This helps us catch and resolve issues regarding the reliability of the data displayed and verify the correctness of the transformations.

View the lineage of your data, all the way from source to dashboard
View the lineage of your data, all the way from source to dashboard

Data Engineer

Data Engineers are responsible for maintaining and improving the data systems that the organization uses for their decision-making and insight generation. DataHub can greatly enhance the productivity of your data engineers by exposing relevant metadata that would otherwise be very difficult to find or generate.

Can I rely on my upstream dataset to be fresh & accurate?

In DataHub, we can dive into the lineage and check on the freshness of data and ensure that the data we’re using to make decisions and generate insights reflects the most recent information we have. If it turns out that the data is stale, we can then use DataHub to locate fresher datasets. Using the lineage tracking features, DataHub can also help us identify broken data transformations that may be blocking the system from updating data.

Track the freshness of your data

Track the freshness of your data

DataHub also displays the top users of each dataset in the sidebar, so in the event of issues like these, users would easily know who to contact to get more information about the history of a dataset. Hence, DataHub can enable even more fluid collaboration in your organization.

Which critical dashboards will I break if I make this change?

With DataHub’s Impact Analysis feature, you can automatically list all dependencies even if they are multiple hops away. You can filter these dependencies based on various facets like ownership, domain, and more. This allows you to make changes more confidently and quickly, as you need not undertake the tedious and error-prone process of identifying all the dashboards that rely on a particular dataset.

Assess the risk of breaking changes to your dataset

Assess the risk of breaking changes to your dataset

How can I find the root cause of a breaking change?

DataHub has many features that make it easy to identify problems quickly and help discover the root cause. Data assertions are easily accessible in the validation tab for a dataset, so we can see which assertions are succeeding and which assertions might be failing. Learn more about how to integrate Great Expectations, a way to define these assertions, with DataHub here.

View the status of data assertions

View the status of data assertions

If assertions are failing, we can look into the stats tab and do sanity checks on the data, such as checking for an abnormally high proportion of null values.

Easily perform sanity checks on your dataset

Easily perform sanity checks on your dataset

DataHub also keeps track of the statistics history, so we can go back in time and pinpoint the exact day that a dataset began having issues.

Takeaways

Search and discovery is a complex and multifaceted challenge that becomes increasingly difficult as your organization accumulates more datasets, dashboards, and platforms over time. Integrating DataHub into your organization’s workflow can enable a wide variety of users to find the answers they need and perform their job more effectively.

Subscribe to this blog for an upcoming post on how DataHub can help team leads managing governance, machine learning, and data platforms.

Acryl Data and the DataHub community are adding even more features over time to magnify the positive impact that your data can have. So, we’d love you to be part of the DataHub community! Want to get involved? Come say hello in our Slack, check out our Github, and watch a recording of our May Town Hall to learn about the latest in DataHub.


Data Science

Big Data

Metadata

Data Engineering

Analytics

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

TermsPrivacySecurity
© 2025 Acryl Data