What is Data Change Management (DCM)?

ℹ️

Data change management is in pre-alpha. We are releasing it to get feedback on its direction. We intend to polish it up in the coming weeks and months using the community's feedback (opens in a new tab).

What is Data Change Management (DCM)?

At its core, the concept of Data Change Management in MooseJS is simple: MooseJS will help you run multiple versions of your application in parallel.

The implications of this are profound. For example, you might have a data model where the schema has evolved (in breaking ways) across versions. Moose will help you run multiple versions of the schema in production in parallel. This lets data producers and data consumers migrate to the latest version of the schema on their own time schedule, without breaking things upstream and downstream.

Meanwhile, Moose will handle keeping your data in sync ACROSS ALL VERSIONS. So you don't drop data on the floor, and so data consumers pulling from the latest version get data that came in through all the older supported versions of the schema. This is pretty cool - let's see how it works.

DCM diagram

Versions

In the Moose Change Data Management model, code in the last commit of your main branch is the current, latest version. It is accordingly tagged as such in the version property in the package.json file in your Moose application's top-level folder.

Previous versions are tagged in the moose.config.toml file, [supported_old_versions] section. We decided to not use Git tags for workflow reasons and to let users leverage tags however they want. If you would rather manage versions through tags instead of a file committed in the repository, please let us know (opens in a new tab).

When a data model is part of a version, the table, ingestion endpoint, and topic are all named with that version. This duplicates some infrastructure such that all the data for each version is consistent and replayable.

We also maintain an alias to the latest version of tables and ingestion endpoints so that you can always send data to the latest version, if you so choose — the alias being the name of the table unmodified by version.

Continuous Migrations

We want to enable you to move forward with your versions and changes without breaking upstream or downstream processes. We also want you to have the latest data available and incentivize your data consumers to move to the latest version of the data model.

As such we continuously process data that lands in old versions to land in the latest version of the data. We do not, however, go back in time: data that lands in old versions of the data model will be populated in new Data Models, but data that lands in new Data Models will not be sent to old Data Models.

This is a continuous process that runs directly in the database. It is near real-time so you will get the freshest data in your latest version of the data model at all times.

This migration is defined in a SQL mapping file, which Moose populates with the expected migration, and can be changed by the user if specific migration mappings, typing, or other parameters need to be defined.

Initial Data Migrations

When a new version of the table is first created, it needs to be filled with data from the previous version. The initial data migration is the initial provisioning of data. When that happens, behind the scenes, the associated infrastructure is paused to ensure that no data gets duplicated accidentally between your versions of the tables.

Once the initial data load is complete in the new table, that infrastructure is unpaused and the data flows.

Context: Why DCM?Walkthrough