Humans of DataHub: Mert Tunç

Humans of DataHub

We are excited to share our next installment of Humans of DataHub, featuring Mert Tunç, a Staff Software Engineer at Udemy. Udemy is a global destination for teaching and learning online.

Mert Tunç 👋

Mert shared his favorite things about the DataHub Community, how Column-level Lineage has supercharged his organization, and much more. You don’t want to miss this interview!

How did you first learn about DataHub?

At Udemy, we had some pain points that could be solved with a data catalog tool. During our research for an open-source solution for a data catalog, we discovered DataHub. DataHub was an up-and-coming alternative because of its capabilities, parallel view, and features that we were looking for, and it has an incredibly engaged community.

What do you enjoy most about the DataHub Community?

The DataHub Community is active. Many people learn from each other and have the opportunity to contribute directly back to the community. One reason this healthy community gets formed is the guidance given for contributing. The core DataHub team is committed to improving the project and includes Community members through collaboration. The team asks for feedback and guides documentation, code contributions, and answers. You can tell they want to encourage participation. It is a unique experience to be part of such a collaborative product development process.

What has DataHub enabled within your organization?

The current focus is on the lineage-side of DataHub. It was a must-have for many critical use cases, and the impact analysis UI makes migrations and refactors much more manageable. At the same time, we utilize the GrahpQL endpoint for more complex queries over the lineage data stored uniformly in the DataHub backend.

What are you most excited to see within DataHub?

I am excited about everything! DataHub is on a great path and is taking steps to become a standard for being *the* hub for metadata. That being said, my pick would be the UI/UX improvements, especially the ones making the platform more accessible to those not directly from the data domain, like business users.

What’s your favorite DataHub feature/use case?

Column-level lineage is my favorite feature and allows my team to have uniformity. It was a nightmare before to find the producers/consumers of a data entity reliably. Lineage has made essential day-to-day operations such as schema changes and refactorings (to name a few) much easier and more accurate.

What is your favorite DataHub slack channel, and why?

#announcements. I want to follow the news around the tool and updates from DataHub’s core team; the #announcements channel is how I do that. The team highlights new releases, features, and upcoming metadata-related conferences they’ll attend. They also share the monthly Datahub Town Halls agendas.

What advice would you give to someone just joining the DataHub Community?

Don’t hesitate to ask questions or respond to others. Slack is an excellent resource because of the active community; most queries or issues you will face have likely been discussed. Most importantly, feel free to open some PRs. It is surprisingly easy to create a pull request and get a quick review. My personal favorite is the documentation updates or fixes. Even though it may be easy for you, the effect on documentation improvement is huge!

If you are new to DataHub, just beginning to understand what “metadata” and “modern data stack” mean, or you’ve just read these words for the first time (howdy, friends! 🤠), let us take a moment to introduce ourselves and share a little history;

DataHub is an extensible metadata platform, enabling data discovery, data observability, and federated governance to tame the complexity of increasingly diverse data ecosystems. Originally built at LinkedIn, DataHub was open-sourced under the Apache 2.0 License in 2020. It now has a thriving community with over 5.4k members (and growing!) and 300+ code contributors, and many companies are actively using DataHub in production.

We believe that data-driven organizations need a reimagined developer-friendly data catalog to tackle the diversity and scale of the modern data stack. Our goal is to provide the most reliable and trusted enterprise data graph to empower data teams with best-in-class search and discovery and enable continuous data quality based on DataOps practices. This allows central data teams to scale their effectiveness and companies to maximize the value they derive from data.

Want to learn more about DataHub and how to join our community? Visit https://datahubproject.io and say hello on Slack. 👋

Humans of DataHub: Mert Tunç

Governing the Kafka Firehose

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Five Signs You Need a Unified Data Observability Solution