Founded 32 years ago, MYOB is a leading business management platform in Australia and New Zealand, providing solutions to streamline business workflows, from finance and inventory management to employee onboarding and payroll.
In 2020, MYOB embarked on a journey to implement a data mesh architecture to serve its growing number of internal and third-party data producers and consumers. At the core of the platform, Snowflake and dbt were chosen for data storage, computation, and transformation.
As more and more producers and consumers came onboard and dependencies started to emerge between them, it quickly became clear that MYOB needed a robust catalog and lineage tool to keep track of all their data. After an extensive comparative analysis of the tools available, MYOB settled with Acryl Cloud as its metadata platform. The comprehensive features of the tool, the smooth UI, a complete API, and the fact that it was built on the popular open-source DataHub Project, well supported by the Acryl team, made the decision to adopt the product easy for MYOB.
The data platform at MYOB relies on dbt as its transformation tool of choice. As of writing, there are more than a thousand dbt transformations on the platform. As data is transformed from its initial raw format into layers of transformed and calculated datasets, the dependency tree often grows quickly where changes to one dataset’s schema would affect layers of downstream datasets. This often resulted in data consumers down the tree noticing their datasets, reports, or dashboards broken without any warning.
With Acryl, MYOB was able to allow the owners of those downstream datasets to fire up Acryl’s UI and identify the upstream datasets they depend on. They could then contact the owners of those datasets through Slack or other means and work with them to update their dbt transformations, which was a big milestone in the process. Next, the MYOB team set out to create automatic warnings for changes to the schema of parent datasets by embracing the shift-left philosophy. Here is how.
All dbt transformations at MYOB are written into GitHub repositories. Each repo has a CI/CD pipeline linked to it which runs and tests all dbt transformations. Buildkite is MYOB’s tool of choice for CI/CD pipelines. A typical pipeline would look like below:
The last step of the pipeline, “Notify data consumers,” is the magical new step that sends out notifications to the owners of downstream datasets if the change to the current model impacts them. The Pull Request (PR) creator is unable to merge the PR until they manually unblock this step. Once this step is triggered, it would send notification emails to all the direct data consumers of this dbt model. The notification service can also write messages to a queue and any other systems within the organization can subscribe to those messages and act accordingly.
The idea of the dbt change notification is to automatically send notifications to the data consumer before the dbt change is merged into the default branch and deployed to production, giving the data consumer the ability to review what’s being changed and leave comments in the Pull Request to start a conversation with the change creator.
“Before bringing Acryl on board, MYOB’s data teams would see multiple breaking changes per week. Since integrating Acryl into our workflow about a year ago, even though our overall usage of Snowflake has gone up 4 times, Acryl has helped us significantly reduce the number of breaking changes, to the extent that they are no longer a burden on all teams.”
Engineering Manager MYOB
All owners of the direct downstream datasets (first-degree consumer) would receive an email.
The notification is triggered by:
The dbt change notification script is built into a Docker image. This container is run every time someone clicks the “notify data consumers” step in the Buildkite pipeline. Inside the container, here is what happens:
The MYOB data platform features more than a thousand dbt transformations. With such a large dependency tree, changes to dataset schemas would frequently introduce breakages. Since onboarding Acryl, producers are more mindful of what downstream dependencies their datasets have, and consumers engage with data owners more actively.
Asad Naveed, MYOB’s Engineering Manager, said, “Before bringing Acryl on board, MYOB’s data teams would see multiple breaking changes per week. Since integrating Acryl into our workflow about a year ago, even though our overall usage of Snowflake has gone up 4 times, Acryl has helped us significantly reduce the number of breaking changes, to the extent that they are no longer a burden on all teams.”
With a focus on the MYOB use case, the Acryl team developed the dbt Impact Analysis GitHub Action. This GitHub Action functions by automatically adding comments to dbt GitHub Pull Requests, providing a complete picture of the impact of changing a dbt model across all downstream entities. This covers immediate dependencies, as well as those multiple hops away across different downstream tools. This functionality empowers Pull Request creators to preemptively detect and prevent breaking changes before they happen.
Additionally, Acryl also introduced Subscriptions and Notifications, simplifying the process for data platform teams to seamlessly broadcast changes to downstream users. This system also enables users to subscribe to specific datasets, ensuring swift awareness of any alterations to their data. Subscribe to Acryl Data's Blog to stay up to date on exciting new features!