BACK TO ALL POSTS

Humans of DataHub: Vincent Koc

Data Engineering

Open Source

Humans of DataHub

Community

Engineering

Elizabeth Cohen

Mar 5, 2023

Data Engineering

Open Source

Humans of DataHub

Community

Engineering

Humans of DataHub

Humans of DataHub logo

For this edition of Humans of DataHub, we had the pleasure of speaking with Vincent Koc, Head of Data at hipages, Australia’s largest online trade marketplace, connecting homeowners or businesses with trusted tradespeople.

In this chat, Vincent shares his experience with DataHub, figuring out the data governance journey, his favorite DataHub features, and advice for data folks. This conversation was another reminder of how amazing the DataHub community is, and how much we have to learn from each other.

Humans of DataHub interview with Vincent Koc, closed captioning provided via Youtube

Conversation Transcript & Highlights

Edited for brevity & clarity

Maggie Hays: Welcome to the latest round of Humans of DataHub! Today we are joined by Vincent, one of our community members. Vincent, please introduce yourself –tell us who you are, where you work and what you do.

Vincent Koc: Hi, I’m Vincent Koc, based in Sydney, Australia. I work as the head of data for a digital native organization called hipages. hipages is Australia’s largest trading marketplace. We effectively match traders — or as we like to call them, tradespeople — with consumers in the market. Being a digital native and marketplace organization, we have a huge and vast array of data assets. And that’s how I stumbled across DataHub.

Maggie Hays: Awesome. How did you find DataHub? What was the reason for even starting to look for something like DataHub?

Vincent Koc: My predecessors were early adopters of DataHub as a technology. As an organization, we’re quite big on embracing open technology and open standards. And I believe DataHub has been in this space, which is quite critical when we talk about things like data contracts and data lineage and data governance. open technology that others can adopt. So yeah, my predecessors opted for DataHub, and as I started to explore its capabilities and what I could do with it, I realized you can really shape it for your organization and its data governance..

Maggie Hays: Yeah, definitely. Is there anything you enjoy most about the DataHub community?

Vincent Koc: The town halls, which unfortunately, I can’t join all the time given the timezone difference. But I just love the energy in the announcements and also just watching the announcements and the feature request channel. Seeing the product come to life, shaped by the community, and to be able to see the impact the community has on the roadmap and the development of DataHub is quite interesting to see.

Maggie Hays: That’s one of my favorite parts of my job. It’s so much fun to be able to truly shape the roadmap around what the community is asking for, andcollaborate with folks on really fleshing out those, user stories or use cases.

You mentioned that your you see DataHub as something malleable or something that you can shape to the needs of your work. What are some of the use cases or problems that DataHub is has catered to addressing within hipages?

Vincent Koc: We’ve gone through many iterations of our data architecture from warehouse to lake and now on to the lake house architecture. DataHub with its extendable use of data lineage and ability to call various BI tools and systems, has helped mostly with visibility into what’s going on.

Every company dreams about this perfect world with data governance, but it’s imperfect. Whave to somehow bring structure to it. So DataHub allows us to be a bit more calculated with the choices that we make when dealing with legacy architecture, reports, and systems.

Paul Logan: That’s super cool. Is there anything that you’re really excited to see happen with DataHub in 2023?

Vincent Koc: For me, it’s this space around sort of data contracts. You can define it, you can enforce it, but when you start to understand the impact of a data contract, and at least understand the enforcement, you need lineage in place. You need to understand what is downstream and upstream of adata asset.

What are the data products in your ecosystem? Who’s using it? How often?That’s the only way we can actually work out if the contract has actually been violated or if we’re meeting SLAs.

I find that what we’re trying to centralize a lot of ourmetadata and somehow work it around lineage. I know DataHub has done a lot of work done around the domain model and different concepts of ownership, as well.

So I’m super excited about just really going on that journey and that transformation in terms of data governance.

Maggie Hays: We’re in the early stages of figuring out what a data contract means within DataHub. Feedback from folks like like your team, and your use cases are going to be paramount to helping us really define that.

For example, this whole concept of data data stewardship, giving ownership back to business. But how do we actually give them visibility of their catalog or what reports they have? We need to make it tangible for businesses and people in an organization whot may not be as tech or data literate. We needed the mechanisms to make that visual and more malleable for them to drive adoption.

Maggie Hays: You spoke a little bit about lineage and domains. What’s your personal favorite use case or feature within DataHub?

Vincent Koc: I think the most interesting one for me is DataHub Tags, just that any asset or databasecan be tagged. And that type can come from anywhere. You can query on the data that lives within DataHub, based on those tags. A common example would be like, “Hey, let’s tag this data as sensitive or PII. And let’s see which downstream reports are inheriting this data”

Or knowingwhat’s using a lot of PII, or legacy, or unstable data, things like that.

The tag is quite quite useful in the sense that you can expand on its capability quite simply. And you could just use the SDKs, code, your systems or whatever rules you want, to define those tags as well.

Paul Logan: You’ve been a part of the community for a while now and I’m sure you have some tips and advice for anyone who’s just joining the community. What would you say?

Vincent Koc: For me, it’s more around Slack communities and data communities in general.

Don’t be scared — not everyone knows the answers. Some people are super technical, and some are not, but we’ve all got our own skills in different ways. Feel free to put your hand up and go, “Hey, this doesn’t make sense”. I’m sure there’s someone out there who will lend you a hand or give you some tips and tricks.

At the same time, if you want to learn, contribute, and build features yourself,nothing’s stopping you. Just give it a go.

It’s also a great way to understand how how the systems and the ecosystem shapes up. For example, if I was to go ahead and start building some features on Data Hub, that’s a really good way for me to get in touch with understanding how data governance works, how lineage works, and start to conceptualize that, and learn from that experience.

I think it’s a great learning experience as well as a way to share ideas and different ways of solving problems within the community.

Maggie Hays: Amazing. That’s the dream response right there. In all seriousness, we’re so just grateful for you. And the whole hipages team. You have just been real champions of the project and you’re doing great work — both within the community and in your organization. Thank you so much for taking the time to talk with us.

Vincent Koc: Oh, my pleasure. Thank you so much, Maggie!


If you are new to DataHub, just beginning to understand what “metadata” and “modern data stack” mean, or you’ve just read these words for the first time (howdy, friends! 🤠), let us take a moment to introduce ourselves and share a little history;

DataHub is an extensible metadata platform, enabling data discovery, data observability, and federated governance to tame the complexity of increasingly diverse data ecosystems. Originally built at LinkedIn, DataHub was open-sourced under the Apache 2.0 License in 2020. It now has a thriving community with over 6.5k members (and growing!) and 350+ code contributors, and many companies are actively using DataHub in production.

We believe that data-driven organizations need a reimagined developer-friendly data catalog to tackle the diversity and scale of the modern data stack. Our goal is to provide the most reliable and trusted enterprise data graph to empower data teams with best-in-class search and discovery and enable continuous data quality based on DataOps practices. This allows central data teams to scale their effectiveness and companies to maximize the value they derive from data.

Want to learn more about DataHub and how to join our community? Visit https://datahubproject.io and say hello on Slack. 👋

Data Engineering

Open Source

Humans of DataHub

Community

Engineering

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

TermsPrivacySecurity
© 2024 Acryl Data