BACK TO ALL POSTS

Release update! Lineage Vis Update, dbt meta, Data Freshness Indicator, & new Java Library

Metadata Management

Data Engineering

Open Source

Release Notes

Maggie Hays

Jan 11, 2022

Metadata Management

Data Engineering

Open Source

Release Notes

🥂 Happy 2022, DataHub Enthusiasts!

We’ve started off the year with high-impact improvements to user and developer experience; let’s get you caught up on what you may have missed in recent releases.

Lineage Visualization Update: Show Full Entity Names

We know that sometimes entity names can get very looong, making it tough to interpret the lineage visualization. Starting with v0.8.22, you can now toggle between showing the full or truncated entity titles in the lineage vis:

Lineage Visualization Update: Show Full Entity Names

See it in action here!

Automatically map detail from dbt meta to DataHub Datasets

dbt supports capturing critical model-specific metadata using the meta configuration, allowing authors to specify owners, model status, tags, and more. As of v0.8.22, our dbt source now supports actions to map dbt meta values to DataHub Datasets.

For example, if a dbt model has a meta config has_pii:true , we can define an action that evaluates if the property is set to true and add, let's say, a PII tag to the Dataset in DataHub.

We currently support the following actions to extract values from dbt meta and apply them to DataHub Datasets:

  • add_tag — add a Tag to the Dataset
  • add_term — add a Business Glossary Term to the Dataset
  • add_owner — add an Owner of the Dataset

Here’s an example of how we can map values from the dbt meta config to a Dataset:

Example of how details from dbt meta are mapped to a Dataset’s Tags, Terms, and Owners in DataHub

Example of how details from dbt meta are mapped to a Dataset’s Tags, Terms, and Owners in DataHub

Read the docs here!

🆕 Data Freshness Indicator

DataHub users can now easily see how recently a Dataset was updated using the Last Updated timestamp in the Stats details of a Dataset.

This freshness indicator, coupled with recent query activity, top users, and table & column stats, helps end-users make informed decisions about which datasets are relevant and trustworthy.

freshness indicator

Last Updated is available as of v0.8.22 for Snowflake, BigQuery, and Redshift datasets and can be disabled by setting include_operational_stats:false in the source configuration.

🆕 Introducing: Java REST Emitter

As our Community continues to grow rapidly, we are working hard to make it easier & easier for folks to get up and running with DataHub. With this in mind, we released a Java REST emitter library in v0.8.22 to programmatically generate metadata events from Java-based clients.

The io.acryl:datahub-client Java package offers REST emitter API-s, which can be easily used to emit metadata from your JVM-based systems. For example, the Spark Lineage integration uses the Java emitter to emit metadata events from Spark jobs.

Incubating Metadata Sources and Features

As of v0.8.20, we are incubating the following:

Metabase — currently in beta, this plugin extracts Charts, Dashboards, and associated metadata. So far we have tested it on PostgreSQL and H2 databases and are looking for community members to help test out functionality!

Removing Stale Metadata from the UI — using Stateful Ingestion, DataHub can soft-delete Tables and Views from SQL sources so they will not be surfaced in the DataHub UI.

Have feedback to share about our Metabase connector or handling stale metadata? Tell us all about it in our #ingestion Slack channel!

Community Contributions

Congrats to our first-time contributors!

@MikeSchlosser16 @pramodbiligiri @aditya-radhakrishnan @abiwill @gfalcone @iasoon @lvicentesanchez @grumbler @MugdhaHardikar-GSLab @jawadqu @nsbala-tw @merqurio @hyunminch @sudotty @cccs-eric @xiphl

Big thanks to our repeat contributors!

@treff7es @dexter-mh-lee @RyanHolstien @rslanka @dannylee8 @anshbansal @kevinhu @sgomezvillamor @mayurinehate @varunbharill @gabe-lyons @EnricoMi @hsheth2 @jjoyce0510

Connect with DataHub

Join us on SlackSign up for our NewsletterFollow us on Twitter


Metadata Management

Data Engineering

Open Source

Release Notes

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

TermsPrivacySecurity
© 2025 Acryl Data