Check out Acryl at Current - September 26-27 in San Jose, CA

Acryl Logo
BACK TO ALL POSTS

Humans of DataHub

Community

Open Source

Data Engineering

Engineering

Humans of DataHub: Pablo Ochoa

Elizabeth Cohen

Feb 9, 2023

Humans of Datahub

Humans of Datahub

Before the winter holidays, the Acryl team (Maggie Hays, DataHub Community Product Manager, and Paul Logan, Developer Relations Lead) sat down with Pablo Ochoa, Data Governance and Big Data Architect at Graphenus.

Pablo Ochoa, Data Governance and Big Data Architect at Graphenus.

Pablo Ochoa, Data Governance and Big Data Architect at Graphenus.

In this conversation, Pablo shares his journey to DataHub, what DataHub has enabled within Graphenus, his favorite features, and more. You don’t want to miss this conversation!

Humans of DataHub interview with Pablo Ochoa, closed captioning provided via Youtube

Conversation Transcript & Highlights

Edited for brevity & clarity

Maggie Hays: Welcome, folks to another round of Humans of DataHub. Today, we are joined by my colleague Paul from our Dev Rel team. And also Pablo Ochoa, from our DataHub Community. Pablo, give us an introduction, tell us who you are, where you work what you do all that.

Pablo Ochoa: I’m from Spain, and I’m currently working at a company called Graphenus. That is also based here in Spain. We’re a Big Data platform known for its security, its versatility. I’m a big data and data governance consultant, and I’m responsible for the development.

Maggie Hays: Awesome. You’ve been in the DataHub community for a bit, how did you find DataHub? Like, what were the reasons that you were starting to even look for a metadata platform or data catalog?

Pablo Ochoa: About a kind of a year ago, we began to ask ourselves what we were looking for; we wanted a data governance tool, but we weren’t sure where to begin or which was the best. So we kind of did a bit of studying and research. After a few months of studying different products, we finally decided to choose DataHub, because it stood out for its state-of-the-art architecture and dependencies. And also because it’s easy to deploy.

Paul Logan: You’ve been so active in the Community since you joined, and we’re grateful for that. And one of the things that we’re curious about is, what do you enjoy most about the DataHub Community? What keeps you going?

Pablo Ochoa: It’s kind of the variety and the knowledge of people, like at least, in my opinion, kind of great part of the community is there’s no expect​​ion for everyone to be an expert in everything; there’s a lot of people and such a lot of interest in helping each other like everyone works together and helps where they can. It’s like the dream team; we have been in other people’s shoes, and we have deployed DataHub, maybe we have encountered some problems that people are currently struggling with. Because of our experience, we are able to offer help and guidance. It’s great; the kindness of these people, eager to help you at any time. Like really, any time! There are people in the United States, India, Europe…

Maggie Hays: Seriously, everywhere! From what I can tell, I think we have people from like 58–62 countries, which is just remarkable, right? It’s just totally remarkable. And so yeah, it’s any time of the day, there’s a conversation. Although, it’s a good reminder to turn your phone off sometimes, [being unplugged] is important, too.

Paul Logan: What has DataHub enabled within your organization? What are some of your favorite features?

Pablo Ochoa: Well, ingestion is one of my favorite things. For example, automated metadata ingestion; it allows our users to get metadata information from all of their data tools. So it’s kind of having a centralized place in some sort. You know, being able to define those returns or even domains that will have you either understand things better or to have everything in separate boxes. Nothing gets confused. So yeah, those are the two most remarkable things.

Maggie Hays: What would you like to see happen in the DataHub Community, in the next six months or 12 months?

Pablo Ochoa: I would like to see a view, like an improvement on more security aspects. Like I’ve been doing some testing, for example, the Apache Ranger integration. I found that there may be some, other things that could be patched up. Like, for example, there’s just basic authentication. And also, I’m trying to get into that, and trying to read by with the source code, like, when I’m hoping to contribute someday, I mean, short term. But yeah, I would also like to see, for example, what users could deny, I mean, to be able to deny what a user can see. That’s kind of a main problem to solve. That would be the only two aspects because, as I said, before, DataHub is great.

Maggie Hays: That’s awesome. Yeah, the, we’re starting to hear from folks that are, you know, ingesting 10s of 1000s, or hundreds of 1000s of entities that, like, the human brain can only take in so much information, right? So we need to figure out how to curate those. And also for security reasons, data shouldn’t be accessed by every single person. So I’m excited. In October Town Hall, we shared that we have a saved view feature that’s coming out that I think will be helpful there where we can start to, like, curate, and really kind of hide just irrelevant or sensitive information.

Paul Logan: Do you have a favorite DataHubs Slack channel? I know I see you active in #troubleshoot. Is there one that you like, above all others?

Pablo Ochoa: There’s #all-things-deployment, I like it because it’s such a versatile channel that, you know, you have a myriad of things that you can either learn from, or you can try to jump in and help answering someone else's question. It’s really, really great.

Maggie Hays: Yeah, there’s no shortage in variety of ways to deploy these days. It’s pretty remarkable. All right, one last question for you. If you met someone on the street or at your office, and they were like, “oh my god, I’m going to join DataHub.” What would you tell them, what advice would you give them?

Pablo Ochoa: I mean, I would just tell them to be gentle with people. Because once you get started with kind of slacking all the time and stuff, it’s natural for you to be super involved and eager to help others. So there’s just not so much advice I would give like, just be yourself and kind of be gentle with yourself and others.

Maggie Hays: I love that. Well, Pablo, thank you so much for taking the time. As we said, we are so grateful to have you here. You really are a shining example of how to show up and be a strong contributor in a variety of ways and help everybody else be successful. Thank you for your time.



If you are new to DataHub, just beginning to understand what “metadata” and “modern data stack” mean, or you’ve just read these words for the first time (howdy, friends! 🤠), let us take a moment to introduce ourselves and share a little history;

DataHub is an extensible metadata platform, enabling data discovery, data observability, and federated governance to tame the complexity of increasingly diverse data ecosystems. Originally built at LinkedIn, DataHub was open-sourced under the Apache 2.0 License in 2020. It now has a thriving community with over 6.2k members (and growing!) and 340+ code contributors, and many companies are actively using DataHub in production.

We believe that data-driven organizations need a reimagined developer-friendly data catalog to tackle the diversity and scale of the modern data stack. Our goal is to provide the most reliable and trusted enterprise data graph to empower data teams with best-in-class search and discovery and enable continuous data quality based on DataOps practices. This allows central data teams to scale their effectiveness and companies to maximize the value they derive from data.

Want to learn more about DataHub and how to join our community? Visit https://datahubproject.io and say hello on Slack. 👋

Humans of DataHub

Community

Open Source

Data Engineering

Engineering

NEXT UP

Simplifying Data Monitoring & Management with Subscriptions and Notifications with Acryl DataHub

If you're part of a data team responsible for a business-critical dataset, dashboard, or any other data asset, you know how important it is to stay on top of any upstream changes before they impact you and your stakeholders. What if a table you rely on just got deprecated? What if a column you use was removed upstream? Or if an upstream table missed an update and now has stale, un-synced data? Staying updated on critical assets in real time is critical to effective data monitoring and data quality. Given the complexity of today’s data environment, doing this is no walk in the park. But what if there was a way to stay in the loop all the time? And know exactly what happened – right when it happened? With Acryl DataHub's Subscriptions and Notifications feature, you can.

Maggie Hays

2023-09-20

Data Products in DataHub: Everything You Need to Know

See an overview of DataHub’s vision and current model for Data Products, as well as our vision and commitments for the future.

Shirshanka Das

2023-09-19

Data Contracts in DataHub: Combining Verifiability with Holistic Data Management

See how we’ve implemented Data Contracts within DataHub, how you can get started, and how the Data Products functionality can help you get the most out of Data Contracts.

Shirshanka Das

2023-09-19

Get started with Acryl today.
Acryl Data delivers an easy to consume DataHub platform for the enterprise
See it in action
Acryl Data Logo
TermsPrivacySecurity
© 2023 Acryl Data