Developing your App
Data Change Management
Context: Why DCM?

Context: why do we need Data Change Management (DCM)?

Data, like software, has a life of its own. As the environment evolves, so does the data it produces. On the consuming end, data is also a moving target. Business needs evolve, and data and metrics that need to be tracked change too.

As such, our data systems need built-in processes to evolve data. In SQL databases, we have ALTER TABLE statements. We also have ORMs and database migration tools. Those are well suited for Transactional workloads and database schema evolutions. However, they are not appropriate for Data Analytics and OLAP storage. Why? because the requirements are very different between the analytics and the transactional, especially when it comes to scale.

In transactional databases, we necessarily emphasize the data's consistency and the transactions' atomicity—schema change operations in SQL are atomic. When the data gets big, managing the schemas and their evolutions becomes challenging, and complex systems have been built to handle that complexity.

For analytical databases, we have stayed at the point where tables are evolved manually or through a similar schema change tool from the transactional side. Whilst the tools are similar to those used for transaction data migrations, the constraints are very different. More often than not, the data is event-based and immutable, meaning it can get duplicated without the overhead otherwise needed to keep versions in sync. Additionally, in analytical databases, the producer and consumer of the data are often decoupled, whereas in a transactional system, those are often the same. For example, in a transactional system, you might have one service, that has its database. No other service touches that database. All external requests go through the API. As such, evolving the database is self-contained to evolving the service.

On the analytics side, usually, the data is consumed by business dashboarding tools, and the dashboard evolves on a different schedule than the side producing the data. Most of the time, they are owned by different teams.

We've made progress in constraining the problem on the transactional side. However, the data on the analytics side is caught between two sides that are evolving at different speeds. Regrettably, the current tools at our disposal do not enable us to resolve the issue expeditiously.

Data Change Management in Moose is an attempt to solve for that.