BACK TO ALL POSTS

Humans of DataHub: Pablo Ochoa

Humans of DataHub

Community

Open Source

Data Engineering

Engineering

Elizabeth Cohen

Feb 9, 2023

Humans of DataHub

Community

Open Source

Data Engineering

Engineering

Humans of Datahub

Humans of Datahub

Before the winter holidays, the Acryl team (Maggie Hays, DataHub Community Product Manager, and Paul Logan, Developer Relations Lead) sat down with Pablo Ochoa, Data Governance and Big Data Architect at Graphenus.

Pablo Ochoa, Data Governance and Big Data Architect at Graphenus.

Pablo Ochoa, Data Governance and Big Data Architect at Graphenus.

In this conversation, Pablo shares his journey to DataHub, what DataHub has enabled within Graphenus, his favorite features, and more. You don’t want to miss this conversation!

Humans of DataHub interview with Pablo Ochoa, closed captioning provided via Youtube

Conversation Transcript & Highlights

Edited for brevity & clarity

Maggie Hays: Welcome, folks to another round of Humans of DataHub. Today, we are joined by my colleague Paul from our Dev Rel team. And also Pablo Ochoa, from our DataHub Community. Pablo, give us an introduction, tell us who you are, where you work what you do all that.

Pablo Ochoa: I’m from Spain, and I’m currently working at a company called Graphenus. That is also based here in Spain. We’re a Big Data platform known for its security, its versatility. I’m a big data and data governance consultant, and I’m responsible for the development.

Maggie Hays: Awesome. You’ve been in the DataHub community for a bit, how did you find DataHub? Like, what were the reasons that you were starting to even look for a metadata platform or data catalog?

Pablo Ochoa: About a kind of a year ago, we began to ask ourselves what we were looking for; we wanted a data governance tool, but we weren’t sure where to begin or which was the best. So we kind of did a bit of studying and research. After a few months of studying different products, we finally decided to choose DataHub, because it stood out for its state-of-the-art architecture and dependencies. And also because it’s easy to deploy.

Paul Logan: You’ve been so active in the Community since you joined, and we’re grateful for that. And one of the things that we’re curious about is, what do you enjoy most about the DataHub Community? What keeps you going?

Pablo Ochoa: It’s kind of the variety and the knowledge of people, like at least, in my opinion, kind of great part of the community is there’s no expect​​ion for everyone to be an expert in everything; there’s a lot of people and such a lot of interest in helping each other like everyone works together and helps where they can. It’s like the dream team; we have been in other people’s shoes, and we have deployed DataHub, maybe we have encountered some problems that people are currently struggling with. Because of our experience, we are able to offer help and guidance. It’s great; the kindness of these people, eager to help you at any time. Like really, any time! There are people in the United States, India, Europe…

Maggie Hays: Seriously, everywhere! From what I can tell, I think we have people from like 58–62 countries, which is just remarkable, right? It’s just totally remarkable. And so yeah, it’s any time of the day, there’s a conversation. Although, it’s a good reminder to turn your phone off sometimes, [being unplugged] is important, too.

Paul Logan: What has DataHub enabled within your organization? What are some of your favorite features?

Pablo Ochoa: Well, ingestion is one of my favorite things. For example, automated metadata ingestion; it allows our users to get metadata information from all of their data tools. So it’s kind of having a centralized place in some sort. You know, being able to define those returns or even domains that will have you either understand things better or to have everything in separate boxes. Nothing gets confused. So yeah, those are the two most remarkable things.

Maggie Hays: What would you like to see happen in the DataHub Community, in the next six months or 12 months?

Pablo Ochoa: I would like to see a view, like an improvement on more security aspects. Like I’ve been doing some testing, for example, the Apache Ranger integration. I found that there may be some, other things that could be patched up. Like, for example, there’s just basic authentication. And also, I’m trying to get into that, and trying to read by with the source code, like, when I’m hoping to contribute someday, I mean, short term. But yeah, I would also like to see, for example, what users could deny, I mean, to be able to deny what a user can see. That’s kind of a main problem to solve. That would be the only two aspects because, as I said, before, DataHub is great.

Maggie Hays: That’s awesome. Yeah, the, we’re starting to hear from folks that are, you know, ingesting 10s of 1000s, or hundreds of 1000s of entities that, like, the human brain can only take in so much information, right? So we need to figure out how to curate those. And also for security reasons, data shouldn’t be accessed by every single person. So I’m excited. In October Town Hall, we shared that we have a saved view feature that’s coming out that I think will be helpful there where we can start to, like, curate, and really kind of hide just irrelevant or sensitive information.

Paul Logan: Do you have a favorite DataHubs Slack channel? I know I see you active in #troubleshoot. Is there one that you like, above all others?

Pablo Ochoa: There’s #all-things-deployment, I like it because it’s such a versatile channel that, you know, you have a myriad of things that you can either learn from, or you can try to jump in and help answering someone else's question. It’s really, really great.

Maggie Hays: Yeah, there’s no shortage in variety of ways to deploy these days. It’s pretty remarkable. All right, one last question for you. If you met someone on the street or at your office, and they were like, “oh my god, I’m going to join DataHub.” What would you tell them, what advice would you give them?

Pablo Ochoa: I mean, I would just tell them to be gentle with people. Because once you get started with kind of slacking all the time and stuff, it’s natural for you to be super involved and eager to help others. So there’s just not so much advice I would give like, just be yourself and kind of be gentle with yourself and others.

Maggie Hays: I love that. Well, Pablo, thank you so much for taking the time. As we said, we are so grateful to have you here. You really are a shining example of how to show up and be a strong contributor in a variety of ways and help everybody else be successful. Thank you for your time.



If you are new to DataHub, just beginning to understand what “metadata” and “modern data stack” mean, or you’ve just read these words for the first time (howdy, friends! 🤠), let us take a moment to introduce ourselves and share a little history;

DataHub is an extensible metadata platform, enabling data discovery, data observability, and federated governance to tame the complexity of increasingly diverse data ecosystems. Originally built at LinkedIn, DataHub was open-sourced under the Apache 2.0 License in 2020. It now has a thriving community with over 6.2k members (and growing!) and 340+ code contributors, and many companies are actively using DataHub in production.

We believe that data-driven organizations need a reimagined developer-friendly data catalog to tackle the diversity and scale of the modern data stack. Our goal is to provide the most reliable and trusted enterprise data graph to empower data teams with best-in-class search and discovery and enable continuous data quality based on DataOps practices. This allows central data teams to scale their effectiveness and companies to maximize the value they derive from data.

Want to learn more about DataHub and how to join our community? Visit https://datahubproject.io and say hello on Slack. 👋

Humans of DataHub

Community

Open Source

Data Engineering

Engineering

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Five Signs You Need a Unified Data Observability Solution

A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable. When data quality fires become routine, trust is eroded. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture

John Joyce

2024-04-17

Get started with Acryl today.
Acryl Data delivers an easy to consume DataHub platform for the enterprise
See it in action
Acryl Data Logo
Acryl DataHub
Acryl ObserveCustomer Stories
TermsPrivacySecurity
© 2024 Acryl Data