Come See Acryl at Coalesce - October 16-19 | San Diego, CA

Acryl Logo
BACK TO ALL POSTS

Onboarding

DataHub

Data Catalog

Metadata

5 Tips for Rolling out a Data Catalog

Paul Logan

Mar 3, 2023

Now that you have successfully deployed DataHub in your organization, it’s time to make the most of the platform by rolling it out to your stakeholders.

We know this can be a daunting task, so we reached out to members of the DataHub Community to hear how other folks have successfully introduced the tool within their companies.

SpaceX Launch

Photo by SpaceX on Unsplash

Read on to learn 5 concrete steps you can take to launch DataHub within your organization.

#1: Educate (& Deprecate) Around Data Catalogs

If DataHub is your first data catalog, take the time to educate your stakeholders about catalogs and their utility in the data stack. Help people understand the problems DataHub is meant to address and how you envision it fitting into their day-to-day workflows. The DataHub Blog is a great resource to help you get started, particularly:

3 Must-Haves for Metadata Management

If you have a previous data catalog, create a deprecation plan:

Create and forward-communicate hard deadlines for eventual deprecation with multiple stages where you’ll:

  • Stop adding new users
  • Disable UI
  • Disable API
  • Fully deprecate

Pair the deprecation with other onboarding techniques (champions, email campaigns, persona targeting). Make use of banners in your old tool & Slack announcements to increase awareness around the deprecation.

For example, the Data Discovery Team at one of our partners established both hard and soft deprecation deadlines:

  • Soft: If a user visited the old tool, it redirected them to DataHub. However, they were able to bypass the redirect if they were determined.
  • Hard: Their data team replaced the core functionality on the backend of their old catalog with DataHub, and tore down the previous infrastructure, disabling functionality from the old tool from that date on.

#2: Enlist Champions & Shift Left

Early in your rollout, take the time to identify and partner with highly-motivated stakeholders in your organization to serve as champions, and team up with them to address their common pain points via DataHub.

Sample Use-Cases for Different Champion Personas

Aren’t sure which stakeholders to engage? Look for members of your organization that have recurring workflows and/or responsibilities that could be improved by adopting DataHub.

For example, Data Engineers regularly change/update schemas that may have unintended consequences on downstream dependencies. By leveraging DataHub’s Impact Analysis feature, they can start to proactively communicate breaking changes to downstream data consumers.

Here’s a breakdown of common use cases and personas to consider when you’re looking for DataHub champions:

Common use cases to target when searching for DataHub champions

Common use cases to target when searching for DataHub champions

Once you have identified your DataHub Champions and the targeted use cases to solve with DataHub, schedule 1 on 1 time for tests and progress checks to ensure they are empowered to get the most from the tool. Draw from their journey for examples, “aha! moments”, and key learnings you can broadcast to your wider audience, and partner with them to onboard their immediate teammates & teams to replicate the workflows.

For example, here’s how Tim Bossenmaier, Data Engineer at inovex, described the two key personas they targeted with their rollout:

Whether they are business or data analysts or data scientists, we want to provide everyone who works with data with all the information they need in DataHub. For this reason, we pay special attention to the correctness of dataset schemas, the provision of schema descriptions, and correct lineage.
Analysts
Anyone who consumes data in a downstream way, mainly as report dashboards, etc. For them, it is very important that all KPIs are clearly defined in the glossary and linked to the appropriate entities in DataHub.
Data Stakeholders

Tim’s team shared learnings and demonstrated DataHub’s features to a select group of these users via weekly sprint reviews during their rollout.

Shift Left: Capture Metadata at its Source

Meet your DataHub Champions where they are. It’s highly likely that they are already capturing documentation, annotation, quality tests, and more in their existing tools and workflows.

Whenever possible, “Shift Left” by capturing this rich context at its source and sending it to DataHub. This will remove points of friction for adoption and will empower your Champions to focus on generating high-quality metadata in the tools and environments they already use. You can learn more about the power Shift Left here.

#3: Email Campaigns, Slack Channels, & Broadcasting Links

Establish a designated Slack/Teams channel for the rollout where you can post announcements and troubleshoot issues. Announce and link to the channel in the relevant company and team-wide channels.

Create a regular email campaign where you inform users of the state of the rollout and drive adoption with hooks that draw people in:

  • Link out to interesting & relevant discoveries in DataHub
  • Communicate timelines for the rollout & deprecations
  • Include materials from our blog and Youtube channel, or make your own to help users understand DataHub’s usefulness in the specific context of your org.
  • Speak to personas & value adds with featured quotes from your champions.

Be sure to link out to DataHub at every opportunity, on every surface you can find:

  • GitHub READMEs/PRs,
  • Slack/Teams,
  • emails,
  • app banners,
  • PagerDuty notifications,
  • birthday cards,
  • memes.
OK, so maybe not memes. But everything else!

OK, so maybe not memes. But everything else!

DataHub’s job in your organization is to provide helpful context and visibility; the wider you broadcast links, the better a job it will do, and the more people will understand what it’s for and what they can do with it.

#4: Regular Onboarding Workshops & Office Hours

Schedule regular onboarding workshops and announce them in email updates and common channels. Prioritize making users’ lives easier, and ensure they walk away with net new knowledge. You can lean on your champions to find a story that will stick with users.

One of our partners found that the governance conversation was especially compelling, and DataHub was the perfect solution to this common problem.

Schedule regular meetings and broadcast that you are available for troubleshooting. Target presentations and Demos for engineering/learning weeks and internal meetups to increase awareness of the rollout.

A sample week in our proposed onboarding program.

A sample week in our proposed onboarding program.

#5: Defining Success: Establish Goalposts, Owners, and KPIs

Before you start onboarding users, agree on what success looks like for your rollout by creating goalposts:

  • Establish success metrics and a future timeline of their expected state.
  • Assign owners that are accountable for these metrics as KPIs.
  • Create goalposts both for your stakeholders and for the team responsible for onboarding.
  • Always keep in mind: What is the key goal DataHub will hep you accomplish?

An example of onboarding team goalposts:

  • Champions identified & onboarded for each team by X date.
  • 20 WAUs by X date, 40 by Y date, and 100 by Z date.
  • 5 onboarding workshops held by X date with total 100 attendees.
  • 90% of old catalog’s traffic moved to DataHub by 2 weeks before the deprecation.

An example of stakeholder/end-user goalposts to set and encourage governance expectations:

  • 60% of assets have ownership by X date
  • Glossary terms added to all domains by X date.
  • All lineage populated for the Data Platform Team by end of the quarter.

In order for people to feel ownership of entities in DataHub, the introduction of dedicated roles can be helpful. Tim shared his expectations of a “data steward” in DataHub as an example:

“Since we don't want to manage all this data centrally, we have introduced the role of data stewards. We plan to have one data steward per area, who will then be responsible for keeping the KPI definitions in the glossary up to date and contacting teams when data appears to be corrupted or a KPI appears to be miscalculated.



Data stewards have special permissions that distinguish them from regular users. This is also the user group we are currently focusing on the most and for whom we are offering the introductory workshop sessions. We hope they will help us spread and establish DataHub throughout the organization.”

Have one team take point

A key point of success for one of our partners was its Data Discovery Team, which is the sole team responsible for discovering and ingesting the company’s data stack into DataHub. The team worked with technical stakeholders to identify new ingestion sources to bring into DataHub and owned developing any custom functionality that was required to onboard a new team.

Their Data Discovery team also started utilizing Great Expectations beyond the DataHub’s out-of-the-box integration. They leverage the built-in Great Expectations features in DataHub to profile the data, and additionally, they have rolled out Great Expectations as a stand-alone tool to drive data quality across the organization. Having both of these efforts under the same team will make it easier to converge in the future.

With one team running point, it’s easy to create KPIs around what portion of you’re org’s data stack is represented in your catalog. This creates incentives towards quick integration which don’t exist when the responsibility of ingestion falls to owners who don’t yet see the catalog as part of their daily workflow.

Bonus: Common Pitfalls

  • Don’t focus on too small a use-case: it’s a data party, and everyone’s invited.
  • Don’t start with too little data: ingest everything you can find that adds value.
  • Don’t just grab anything: review the sources you’re ingesting to prevent friction from poorly curated or blank datasets.
  • Don’t wait until the metadata is perfect: encourage users to fill in gaps and take ownership.

A big thank you to Tim Bossenmaier (inovex) and Juan Garcia Bazan for sharing their lessons learned!

Onboarding

DataHub

Data Catalog

Metadata

NEXT UP

Simplifying Data Monitoring & Management with Subscriptions and Notifications with Acryl DataHub

If you're part of a data team responsible for a business-critical dataset, dashboard, or any other data asset, you know how important it is to stay on top of any upstream changes before they impact you and your stakeholders. What if a table you rely on just got deprecated? What if a column you use was removed upstream? Or if an upstream table missed an update and now has stale, un-synced data? Staying updated on critical assets in real time is critical to effective data monitoring and data quality. Given the complexity of today’s data environment, doing this is no walk in the park. But what if there was a way to stay in the loop all the time? And know exactly what happened – right when it happened? With Acryl DataHub's Subscriptions and Notifications feature, you can.

Maggie Hays

2023-09-20

Data Products in DataHub: Everything You Need to Know

See an overview of DataHub’s vision and current model for Data Products, as well as our vision and commitments for the future.

Shirshanka Das

2023-09-19

Data Contracts in DataHub: Combining Verifiability with Holistic Data Management

See how we’ve implemented Data Contracts within DataHub, how you can get started, and how the Data Products functionality can help you get the most out of Data Contracts.

Shirshanka Das

2023-09-19

Get started with Acryl today.
Acryl Data delivers an easy to consume DataHub platform for the enterprise
See it in action
Acryl Data Logo
TermsPrivacySecurity
© 2023 Acryl Data