Dec 11, 2023
This is where DataHub, the open-source metadata platform and data catalog project, shines. Practitioners look to DataHub as the central, authoritative source of truth for the lineage, provenance, and history of the data assets dispersed across their stacks. DataHub allows them to discover, access, and share data across decentralized teams.
Acryl Cloud is an enterprise-ready data management platform based on the DataHub project. It provides a central control plane for the decentralized data stack, marrying industry-leading data catalog capabilities with best-in-class features for data governance and observability. As a managed service, Acryl Cloud streamlines the setup, operation, and maintenance of DataHub, ensuring a secure, available, elastically scalable DataHub experience. Because many top contributors to the open-source DataHub project are also Acryl Data employees, Acryl Cloud customers benefit from direct access to domain experts with in-depth knowledge, resulting in rapid problem resolution and unmatched support.
Drawing on Acryl Data’s expertise in operating and scaling DataHub, Acryl Cloud offers organizations proven patterns for metadata-driven data governance and real-time data quality management. It also enables organizations to implement scalable shift-left governance patterns for data engineering and analytic development—integrating data quality, validation, and other checks into these workflows.
Let’s look at how Acryl DataHub’s features are essential to a mix of personae in different roles.
For the Data Platform Lead, DataHub is a foundational component of the modern distributed data platform, streamlining discovery for producers and consumers, and promoting simple, straightforward access. DataHub's ability to track data lineage provides insight into the origin and history of datasets, helping simplify maintenance and troubleshooting. It also functions as a key component of platform governance, tracking where data assets came from and what was done to them, as well as by whom.
However, a data platform must also have some built-in mechanism for defining, observing, and enforcing governance standards. This is necessary to ensure that business-critical data assets always meet defined standards for quality, freshness, formatting, ownership, and other criteria.
By itself, DataHub doesn’t give data platform leads all of the features they need to do this.
Acryl Cloud does. Data platform leads can use Acryl Cloud’s built-in features to create metadata-driven, platform-wide governance policies. They can create automated checks that monitor assets—including datasets, dashboards, charts, or pipelines—to ensure compliance with pre-defined rules. They can monitor a specific asset, like a C-level dashboard or a critical operational dashboard, or a class of assets—such as any or all business-facing assets. In the latter case, a data platform lead could define a policy which mandates that all business-facing assets must first undergo a battery of data quality and validation checks before being made available to consumers. They can also define automated actions (like notifications) that trigger when an asset passes or fails a rule, alerting key stakeholders.
Acryl Cloud gives data platform leads accurate, up-to-date insight into the real-world state of their platform’s distributed data assets—production and otherwise. Data platform leads can also use Acryl Cloud to identify redundant or obsolete data sets, along with seldom-used data assets, helping optimize not just their cloud storage spend, but also compute and I/O costs (backups, indexing, replication, etc.). With Acryl Cloud’s built-in data lineage and impact analysis capabilities, platform leads can make informed decisions about what to do with these assets—deduplicating, archiving, compressing, or restructuring them based on operational requirements or organizational priorities.
DataHub is a cornerstone component in the Data Engineer’s toolkit, useful not only for discovering data, but also tracking and understanding data lineage, monitoring and improving data quality, automating different kinds of rote tasks, and promoting collaboration across decentralized teams.
Acryl Cloud brings shift-left governance to DataHub, integrating best-in-class features that make it much easier for data engineers to create, test, and validate their data pipelines; diagnose and debug data outages and performance problems; and define/enforce SLAs for data quality, freshness, documentation, and other criteria. Shift-left governance enables data engineering teams to produce more reliable data pipelines—catching and fixing issues earlier, and reducing rework post-deployment . Shifting governance left into local development also lets data engineers iterate faster, accelerating the rate at which new data sources can be onboarded and/or integrated with existing data assets.
Thanks to Acryl Cloud’s observability and alerting capabilities, data engineers can detect and address different types of breaking changes—like schema modifications, or new database constraints—as soon as they happen. This helps accelerate debugging and troubleshooting, and also simplifies impact analysis: with Acryl Cloud, data engineers can more easily understand, fix, and/or work around the impact of data quality problems, breaking changes, and other anomalies. This results in production dataflows that are more performant and reliable, as well as easier to troubleshoot and maintain.
Finally, with Acryl Cloud’s support for flexible access controls, and its rich integration with cloud IAM and secrets management services, data engineers can easily integrate DataHub with their CI/CD workflows. This improves compatibility with DataOps, MLOps, and other full-lifecycle methodologies, allowing engineers to maintain security and control across all phases of development and deployment.
Analytics Leads understand that for data to be truly useful, it needs to be consistent, discoverable, and easily accessible—but effectively governed, too. They also recognize that data work is a team sport, and that promoting communication and collaboration among cross-functional teams is critical for analytic development. DataHub is a partial solution, giving analytic teams a way to not only discover and access datasets, but annotate them, too—adding context, clarifying ambiguities, or flagging issues. But DataHub doesn’t tick off all of the requirements on the Analytics Lead’s checklist. Acryl Cloud does. With Acryl Cloud, Analytics Leads get the benefits of the same metadata-driven governance capabilities that data platform engineers depend on. Analytics Leads can create rules governing the quality, freshness, and transparency of production data, relying on Acryl Cloud to enforce them. Its built-in observability module, coupled with its real-time metadata monitoring and event-driven enforcement capabilities, ensure that data quality problems, rule violations, and other anomalies are detected as soon as they occur. Thanks to Acryl Cloud’s integration with Slack, and similar platforms, data producers and other responsible stakeholders can get alerts in real-time, too. In addition, Acryl Cloud features like Glossary Term approval flows help Analytics Leads enforce consistent usage of canonical business terms among cross-functional teams—both during EDA and beyond. And with Acryl Cloud, data practitioners can easily explore metadata in DataHub using their web browsers, allowing users of notebooks, tools like Apache Superset, or BI tools (Looker, Tableau, and others) to discover data without going outside their established workflows.
Governance Leads are tasked with maintaining the consistency, accuracy, security, and compliance of data assets. In collaboration with Data Platform Leads and other stakeholders, they establish patterns, processes, and controls for an organization’s governance standards. DataHub gives a way to make data assets discoverable and accessible, allowing data practitioners to efficiently find and use them.
But Data Governance Leads need a reliable mechanism they can use to define and enforce governance policies across the modern, distributed data stack. They also need tools, methods, and patterns to enforce shift-left data governance practices—for example, by embedding automated governance checks in CD/CD deployment workflows for data pipelines, models, and other assets.
With DataHub as its base, Acryl Cloud allows Governance Leads to both create and apply different kinds of metadata-driven governance policies and to enforce these rules earlier in the data lifecycle, in accordance with shift-left governance principles. Policies can be highly granular—applying to specific tables, columns, views, charts, dashboards, or other data entities—or extremely broad, applying to all production-ready data assets, to all PII, or other categories. Beyond this, Acryl Cloud's real-time observability module, allows Governance Leads to continuously track updates, changes, and anomalies across a broad range of entities—from individual columns in relational tables to specific files in cloud storage. If a certain event or pattern of events is detected, Acryl Cloud will automatically alert stakeholders, and, if applicable, take different kinds of predefined actions, such as initiating a rollback.
For shift-left governance, if a data pipeline or model deployment triggers a violation, Acryl Cloud will halt the process and notify the data producer of the breach. And Acryl Cloud’s support for Approval Workflows gives data producers a programmatic way to propose changes, which can be reviewed by data stewards or other stakeholders to ensure that they’re consistent with governance standards.
The Business User is not a monolith, but encompasses everyone from business, finance, operations, and marketing analysts, to product and project managers, to customer success managers—and many others. For these users, DataHub enables a unified view of available datasets, providing useful context about their origin, lineage, freshness, and dependencies. This helps simplify data discovery, eliminating many traditional barriers to data access.
DataHub provides a unified view of data, but Acryl Cloud expands on this to address the distinct challenges faced by business users in a diversity of roles. For example, Acryl Cloud’s built-in support for Data Contracts and Data Products is ideal for data mesh—or any kind of decentralized approach. Data Products can be assigned to specific Domains, allowing owners or data producers to make their products available for secure, governed consumption by other teams within the organization. And Acryl Cloud’s strong collaborative feature set enables decentralized teams to operate autonomously—while at the same time fostering collaboration, promoting consistent understanding, and providing a programmatic feedback loop for changes.For instance, governance authorities can use Glossary Terms to define canonical meanings for common business terms, helping data consumers consistently understand and apply them in their work. Similarly, Tags allow producers and consumers to label and annotate data products, simplifying discovery and helping promote responsible use. Approval Workflows give consumers a programmatic way to propose changes, which can be reviewed by owners, governance authorities, or other stakeholders. Collectively, these features not only bolster discoverability, communication, and collaboration in data mesh and other decentralized structures, but also help organizations ensure data products are at once accessible, governed, and trustworthy.
Finally, Acryl Cloud’s tight integration with cloud IAM services simplifies discovery for decentralized users, enabling access while enforcing security and compliance standards.
Acryl DataHub improves on the foundational capabilities of open-source DataHub, offering a comprehensive suite of features designed to meet the distinct needs of people working in a wide range of technical and business roles. An enterprise-ready, SaaS solution, Acryl Cloud gives you an available, reliable, and performant platform for unified metadata management, serving as a control plane for all of the data assets and products distributed across your stack.
With Acryl DataHub, you can not only consolidate and manage all of your metadata in a single, central location—simplifying governance—but also define, manage, and govern your data assets, irrespective of where they’re located or how they’re instantiated.
With all of this, Acryl Cloud isn’t just an “upgrade” to DataHub—it’s nothing less than a great leap forward in data management, observability, and governance.
Got data chaos, not data management? Get Acryl. It's time to DataHub—but smarter.
When organizations struggle to operationalize ML or AI solutions, the root causes are usually data-related. ML and AI teams can’t find the data they need to define use cases, engineer features, or train their models. When they can find it, they can’t always use it—because they don’t know what it is, where it came from, who created it, when, or for what purpose. Lacking context, any dataset is a black box. Discover why a modern data catalog and metadata platform is a foundational element of any ML or AI platform.
Increasingly, decision-makers and stakeholders just don’t trust their data and analytics—usually because what they’re seeing is out-of-date, incomplete, inconsistent, and sometimes flat-out wrong.
You're a data engineer at a boutique e-commerce start-up. Your company sells luxury goods at steep discounts. One of your many responsibilities involves monitoring the "flash_sale_purchase_events" table in your start-up’s Snowflake data warehouse. Updates to columns in this table are supposed to reflect real-time participation by customers in the limited-time flash sales your company offers.