BACK TO ALL POSTS

Five Signs You Need a Unified Data Observability Solution

Metadata Management

Data Governance

Data Discovery

Compliance

Data Quality

John Joyce

Apr 17, 2024

Metadata Management

Data Governance

Data Discovery

Compliance

Data Quality

The five-alarm data quality infernos always seem to break out in the middle of the night.

Like during your closing period—just hours before finance is expected to finalize quarterly figures. That’s when a sharp-eyed analyst notices `booked_revenue` for several regions is off by an order of magnitude. She suspects an error in the data pipelines used to integrate data from overseas regions. It’s going to be a long night—and not for the first time.

To add insult to injury, the data team is the last to know about the problem.

This seems to be a pattern. Like when a data pipeline stalls and the marketing data mart doesn’t get updated on time. Or a reverse ETL pipeline is misconfigured, feeding Salesforce inaccurate customer interaction data. With every data disaster, one theme seems to hold true: Downstream stakeholders—data users—are the first to discover and alert the data team to data issues, usually in the form of a Slack or an email: “Hmm. This data doesn’t look quite right, does it?”

You can’t completely prevent data quality fires like this from happening, but—by leveraging a data observability tool—you can detect them as soon as they happen, resolve them quickly, and prevent them from causing widespread damage.

Five Signs You Need Data Observability


A data observability tool is like loss-prevention for your data ecosystem, equipping you with the tools you need to proactively identify and extinguish data quality fires before they can erupt into towering infernos. Damage control is key, because upstream failures almost always have cascading downstream effects—breaking KPIs, reports, and dashboards, along with the business products and services these support and enable.

When data quality fires become routine, this corrodes trust. Stakeholders no longer trust their reports, dashboards, and analytics, jeopardizing the data-driven culture you’ve worked so hard to nurture. And when business products and services are impacted, customers start to lose faith in your brand, too.

Below are five signs you need a data observability solution.

  1. You’re constantly putting out fires. Getting ready for work means dressing up in a flame-retardant suit. Small fires occur when updates to data warehouse tables produce data that is inaccurate, out of date, and/or incomplete.

    Metrics that depend on this data—surfaced via executive dashboards and critical reports, or embedded as operational analytics that support business processes and services—consume it without warning. Stakeholders ping your staff double-checking to make sure metrics are reliable, demanding “Where did these numbers come from?”

  2. You don’t have a data quality smoke detector. Remember those angry stakeholders? They are your smoke detectors. You aren’t the first to know about data quality, availability, or compliance issues. Instead, you get alerts via urgent Slack messages—sometimes SHOUTED IN ALL CAPITAL LETTERS—tersely worded PagerDuty incidents, or impromptu requests to video-conference on Zoom.

    Basically, whenever you hear from certain stakeholders, it’s always news—to you—and it’s never good.

  3. Your stakeholders shouldn’t be the canaries in your coal mine.. The angry-stakeholder scenario at least has an upside: you’re quickly alerted to problems. Sometimes, however, data issues go undetected for weeks or months, quietly smoldering … before at last combusting—into five-alarm, all-hands-on-deck infernos.

    For example, an “improvement” to a complex data pipeline results in certain transaction records being duplicated, accidentally inflating revenue figures over several consecutive quarters. Or a mapping error in a database migration script sets the wrong data type for a cloud data-archiving service, so the costs associated with it are recorded as non-billable metadata—instead of as actual expenses. It’s a bad day for all responsible stakeholders when issues like this inevitably come to light.

  4. You don’t have a sprinkler system, so you aren’t prepared when data quality fires do occur. When you work with data, fires must and will occur. Full stop. Something, somewhere, somehow changes upstream, breaking all downstream dependencies. Or someone, somewhere makes a mistake versioning and deploying a critical data pipeline. Or a cloud SaaS/PaaS provider changes certain APIs. Breaking changes will happen.

    Right now, your fire preparation strategy is reactive: you don’t have a proactive plan for identifying and containing damage, and you can’t respond effectively when fires inevitably occur. Your team can’t even conduct realistic fire drills, because it doesn’t have dependable fire prevention and firefighting gear. Data quality issues and outages take longer to resolve and cause more damage than is strictly necessary.

  5. You have a hard time putting data quality fires out. Again, you’re almost never the first to know about a data quality issue or outage. And even when problems are brought to your attention, you have a hard time figuring out how to contain and extinguish them.

    Your data sources and assets are siloed across disparate cloud SaaS and PaaS services, as well as on-premises sites. Not only do you struggle to map and reverse engineer critical dataflows, but you’re overwhelmed by a sheer profusion of data pipelines and SQL queries, many of them redundant. Pinpointing the cause of a fire involves going from one tool interface to another, switching between browser tabs, or sessions in a terminal multiplexer. The upshot is that the same types of fires tend to break out again and again, in many cases causing damage in surrounding areas.


Sometimes your team spends so much time putting out fires it doesn’t have time for much else.

The problem is that most data teams still don’t have an overarching data observability strategy—thorough action plans, paired with an integrated set of tools for putting out data quality fires when they inevitably do occur.

An effective strategy gives both data teams and data leaders a way to understand and visualize the state of their entire data ecosystems, allowing them to detect, triage, and resolve data quality fires as soon as they occur.


For teams, an overarching data observability strategy ensures they:

  • Get alerted to anomalies as soon as they occur;
  • Have the tools and information they need to respond rapidly and effectively;
  • Have a way to notify and update stakeholders, also a key part of damage control;
  • Have a bird’s-eye view of the state and quality of the data assets in their ecosystem.

Knowing you need a data observability strategy is one thing—implementing that strategy is another.

Introducing Acryl Observe

Acryl’s experience deploying and scaling Acryl Cloud in organizations large and small has taught us that data quality and data discovery aren’t just complementary, but inseparable from one another. This knowledge was hard-won: we didn’t go looking for it, it found us. Over and over again, we learned that data quality—defined by metrics like accuracy, consistency, integrity, or freshness—is critical to effective data discovery.

You can’t discover trustworthy data if it isn’t both well-described and of good quality—and you can’t maintain data quality through time unless there is clear impact (lineage), compliance, and accountability (ownership). Missing or incorrect context about ownership, purpose / documentation, compliance labels, or lineage and dependencies will compromise the quality of your data assets over time.

Acryl Observe is a complete solution for implementing your data observability strategy. It seamlessly integrates with Acryl DataHub to unify Data Discovery, Data Governance and Data Observability in a single, integrated solution.

With Acryl DataHub as its foundation, we built Observe to deliver the data observability features you need to:

Know about data quality fires as soon as they break out. You’re always the first to know when things go wrong, and you’re always notified where you work—in Slack, Teams, email, and other channels.

Proactively put out small fires—before they can explode. AI algorithms automatically detect novel data quality anomalies, as well as dynamically surface recommendations for new types of checks, based on changing patterns.

Fight fires where they are, with the right equipment. When fires break out, access to rich lineage metadata, documentation, and other resources helps you quickly pinpoint their sources. Automated impact analysis and integrated incident-manage capabilities enable you to coordinate and focus your response, working around upstream and downstream problems.

Visualize and correct data quality hot-spots. Acryl Observe’s Data Health Dashboard gives you an at-a-glance overview of the state and health of your data assets, refreshing in near-real-time as they’re updated.

Unify Data Discovery, Data Governance, and Data Observability. Acryl Observe is built on top of Acryl DataHub, with the aim of unifying traditionally separate capabilities. Together, they empower your entire organization to discover trustworthy data, govern and organize it effectively, and monitor and maintain its quality through time.


Discover the Acryl Observe Difference


Interested in learning why companies like Notion, Zendesk, and Ovo Energy trust Acryl?

👉 Learn more about Acryl Observe

👉 Contact us today to schedule a demo.


Metadata Management

Data Governance

Data Discovery

Compliance

Data Quality

NEXT UP

Governing the Kafka Firehose

Kafka’s schema registry and data portal are great, but without a way to actually enforce schema standards across all your upstream apps and services, data breakages are still going to happen. Just as important, without insight into who or what depends on this data, you can’t contain the damage. And, as data teams know, Kafka data breakages almost always cascade far and wide downstream—wrecking not just data pipelines, and not just business-critical products and services, but also any reports, dashboards, or operational analytics that depend on upstream Kafka data.

When Data Quality Fires Break Out, You're Always First to Know with Acryl Observe

Acryl Observe is a complete observability solution offered by Acryl Cloud. It helps you detect data quality issues as soon as they happen so you can address them proactively, rather than waiting for them to impact your business’ operations and services. And it integrates seamlessly with all data warehouses—including Snowflake, BigQuery, Redshift, and Databricks. But Acryl Observe is more than just detection. When data breakages do inevitably occur, it gives you everything you need to assess impact, debug, and resolve them fast; notifying all the right people with real-time status updates along the way.

John Joyce

2024-04-23

Data Quality Should be Part of the Data Catalog - Introducing Acryl Observe

We didn’t go looking for an excuse to develop a data observability solution. There’s more than enough to keep us occupied building the world's best data catalog! ;) But the more experience we gained in working closely with Acryl customers, the clearer it became that data quality, data discovery, and data governance aren’t just complementary, but mutually reinforce one another. Acryl Observe provides data teams with everything they need to detect data breakages immediately, contain their downstream impact, keep stakeholders in the loop, and resolve issues fast—so that data teams can spend less time reacting and more time preventing.

John Joyce

2024-04-16

TermsPrivacySecurity
© 2024 Acryl Data