DataHub Summer’22 Rundown☀️

👋 Hello, DataHub Enthusiasts!

It’s hard to believe summer is already winding down; I hope each of you found some time to soak up the sun and recharge. Ready to hear what the DataHub Community has been up to these last couple of months?

Let’s get into it!

The DataHub Community continues to THRIVE!

Every time I blink, there’s a new member in the DataHub Community — we’ve welcomed over 700 people to our vibrant Slack Community within the last two months, and there are no signs of things slowing down!

We continue to see more and more engagement in our Monthly Town Halls, and we are always thrilled to welcome new contributors to the project!

Join us on Slack • RSVP to our Next Town Hall • Follow us on Twitter

NEW! Bulk Edit Metadata via the DataHub UI

DataHub is making it easier than ever to keep your metadata up to date — beginning with v0.8.43, you can now add or remove Owners, Glossary Terms, Tags, Domains, and Deprecate Status to multiple entities with a few clicks:

Example workflow of changing Deprecate Status and adding Owners to multiple entities at once

Want to learn more? Watch John Joyce’s walkthrough from the July Town Hall here:

Looker Integration: Improved Search Experience

We’ve heard feedback from the Community that end-users want an easier way to search for Looker Looks and Dashboards that contain a specific measure or dimension.

So we built just that! Starting with v0.8.44, DataHub indexes all measures and dimensions referenced in Looks and Dashboards, ensuring they will bubble up toward the top of your search results list.

Hear all about it from Gabe Lyons during the August Town Hall:

New to DataHub? Get started with our Docs Site!

We know there’s a lot to learn when it comes to getting ramped up on DataHub, so we want to ensure that folks have the resources they need to get up and running as quickly as possible.

We recently rolled out some significant improvements to the DataHub Docs Site to make it easier and more intuitive for DataHub Developers and End-Users alike to navigate our resources.

The DataHub Docs Site has a new look!

Keep an eye on that space — we’ll be rolling out more user guides and tutorials in the upcoming months!

BIG Improvements to UI-Based Ingestion

We’re on a mission to make it easy for DataHub users to ingest metadata into DataHub. Starting with v0.8.42, you can now easily configure metadata ingestion in the UI to connect to Snowflake, BigQuery, Looker, and Tableau with an easy-to-follow form.

We know that many teams use a combination of the DataHub CLI and UI to ingest metadata, so it’s critical to provide a unified view of run history and outcomes. With this in mind, we rolled out some massive improvements to UI-based Ingestion in v0.8.44, including:

view live logs during job execution
view ingestion run summary (i.e., number of entities ingested)
rollback functionality in case something goes astray
unified overview of UI- and CLI-based ingestion runs

Want to see it in action? Check out Chris Collins’ demo from the August Town Hall:

Metadata Ingestion Improvements, Galore!

The DataHub Community is hard at work, ensuring our existing Ingestion Sources are performant and extract as much valuable metadata as possible. Here are some highlights from v0.8.41 through v0.8.44:

Extracting New Metadata Elements

Chart Entity now supports chartUsageStatistics — this helps index search results to surface the most highly used resources towards the top
dbt ingestion supports auto-extracting owner from the meta block

Improvements to Existing Sources

Stateful Ingestion now supported for the Glue Connector
Improved Snowflake Connector is now available; we expect this to provide a reduction in ingestion run-time and lower levels of complexity.
Configure your BigQuery Connector to profile only a subset of tables

Miscellaneous Ingestion Updates

New! Dataset Domain Transformer — assign a Domain to Datasets during Ingestion
New! Salesforce Connector

265 people have contributed to DataHub to date

Between July and August 2022, we merged 366 pull requests from 50 contributors, 15 of whom contributed for the first time (names in bold):

@abiwill @aditya-radhakrishnan @aezomz @alexey-kravtsov @amanda-her @Ankit-Keshari-Vituity @anshbansal @chriscollins3456 @daha @de-kwanyoung-son @divyamanohar-stripe @dougpm @gabe-lyons @glinmac @hemanthkotaprolu @hsheth2 @Jiafi @jjoyce0510 @justinas-marozas @koconder @ksrinath @leifker @liyuhui666 @maggiehays @Masterchen09 @mayurinehate @milimetric @mohdsiddique @ms32035 @MugdhaHardikar-GSLab @NavinSharma13 @neojunjie @ngamanda @NoahFournier @pedro93 @remisalmon @rslanka @RyanHolstien @salihcaan @Santhin @sgomezvillamor @shirshanka @skylersinclair @szalai1 @tengis @timcosta @topleft @treff7es @vcs9 @xiphl

We are endlessly grateful for the members of this Community — we wouldn’t be here without you!

One Last Thing —

I caught up with my teammate, John Joyce:

Maggie: Hey, John! It’s been a busy summer for DataHub. What upcoming feature are you most excited about?

John: I’m most excited about Advanced Search, which is currently in the works. I think it will allow end users to begin asking much more interesting questions of their Metadata Graph. A lot of power is going to be unlocked by the flexibility this will offer!

M: Totally agree — Gabe’s demo during August Town Hall was so cool! Next question: what song have you been playing on repeat this summer?

J: “I Need a Forest Fire” by Bon Iver & James Blake!

That’s it for this round; see ya on the Internet :)