Using DataHub for Search & Discovery

In large organizations across all domains, the importance of data is the one constant. Data drives decision-making and generates critical operational insights across organizations. Having access to powerful search & discovery tools helps catalyze the impact that data can have on an organization. DataHub integrates many different data sources together to form the ultimate search and discovery tool that can serve a variety of users across an organization. Today, we’ll take a closer look at how DataHub enhances the workflow of a Business Analytics Lead and a Data Engineer.

Business Analytics Lead

An analytics lead would be tasked with generating insights from the data available. Here’s how DataHub can help them quickly answer questions that they would regularly face while doing their job.

What is the authoritative dataset on a subject?

Using DataHub, we can check monthly queries to get a sense of how popular certain queries are and which datasets are popular. Getting insight into the popularity of datasets, measured in monthly queries, is useful metadata that’s impossible to know without an elegant metadata management system like DataHub.

Get insight into salient dataset usage statistics

How is a certain KPl calculated?

KPIs are critical for quantifying the impact of your work and measuring if business changes are having their intended impact. With DataHub’s glossary terms, you can save information about how KPIs are calculated, however complicated they may be, and make it easily discoverable for anyone in your organization.

View information on how KPIs are calculated

Is this dashboard built on reliable sources?

Oftentimes, we’ll have access to dashboards on services like Looker, but Looker won’t give us visibility into how the data has been transformed and combined starting from the raw data. In DataHub, we’re able to view the lineage of where data has been stored and how it was transformed. This helps us catch and resolve issues regarding the reliability of the data displayed and verify the correctness of the transformations.

View the lineage of your data, all the way from source to dashboard

Data Engineer

Data Engineers are responsible for maintaining and improving the data systems that the organization uses for their decision-making and insight generation. DataHub can greatly enhance the productivity of your data engineers by exposing relevant metadata that would otherwise be very difficult to find or generate.

Can I rely on my upstream dataset to be fresh & accurate?

In DataHub, we can dive into the lineage and check on the freshness of data and ensure that the data we’re using to make decisions and generate insights reflects the most recent information we have. If it turns out that the data is stale, we can then use DataHub to locate fresher datasets. Using the lineage tracking features, DataHub can also help us identify broken data transformations that may be blocking the system from updating data.

Track the freshness of your data

DataHub also displays the top users of each dataset in the sidebar, so in the event of issues like these, users would easily know who to contact to get more information about the history of a dataset. Hence, DataHub can enable even more fluid collaboration in your organization.

Which critical dashboards will I break if I make this change?

With DataHub’s Impact Analysis feature, you can automatically list all dependencies even if they are multiple hops away. You can filter these dependencies based on various facets like ownership, domain, and more. This allows you to make changes more confidently and quickly, as you need not undertake the tedious and error-prone process of identifying all the dashboards that rely on a particular dataset.

Assess the risk of breaking changes to your dataset

How can I find the root cause of a breaking change?

DataHub has many features that make it easy to identify problems quickly and help discover the root cause. Data assertions are easily accessible in the validation tab for a dataset, so we can see which assertions are succeeding and which assertions might be failing. Learn more about how to integrate Great Expectations, a way to define these assertions, with DataHub here.

View the status of data assertions

If assertions are failing, we can look into the stats tab and do sanity checks on the data, such as checking for an abnormally high proportion of null values.

Easily perform sanity checks on your dataset

DataHub also keeps track of the statistics history, so we can go back in time and pinpoint the exact day that a dataset began having issues.

Takeaways

Search and discovery is a complex and multifaceted challenge that becomes increasingly difficult as your organization accumulates more datasets, dashboards, and platforms over time. Integrating DataHub into your organization’s workflow can enable a wide variety of users to find the answers they need and perform their job more effectively.

Subscribe to this blog for an upcoming post on how DataHub can help team leads managing governance, machine learning, and data platforms.

Acryl Data and the DataHub community are adding even more features over time to magnify the positive impact that your data can have. So, we’d love you to be part of the DataHub community! Want to get involved? Come say hello in our Slack, check out our Github, and watch a recording of our May Town Hall to learn about the latest in DataHub.