Developing your App
Data Change Management
Walkthrough

Walkthrough

In this walkthrough, we will cover:

  • How to set up multiple versions of your Data Models in a Moose application.
  • How to deprecate old versions of your Data Models
  • Requirements

  • NodeJS (opens in a new tab)
  • Docker (opens in a new tab)
  • macOS / Linux / Windows (Using the Linux Subsystem for Windows)
  • Git (opens in a new tab)
  • Initialize a new project

    In order to start this, we will start from a blank page, this will illustrate the concepts we are talking about and make sure we all start from the same point.

    In the terminal, go to a directory where you wish to run your experiment.

    Terminal
    $ cd ~
    Terminal
    $ npx create-moose-app@latest dcm-walkthrough
    Terminal
    $ cd dcm-walkthrough

    At the end of the initialization process, you will have two Data Models already defined, and the inside of your app folder should look like the following:

    • README.md
    • package.json
    • project.toml
        • models.ts
            • flow.ts
  • When you run

    Terminal
    $ npx @514labs/moose-cli@latest dev

    Moose will spin up a dev infrastructure for you to start building your moose app.

    Go to http://localhost:3001 from your browser. You should see the following console:

    Console Initialized

    As you can see here, you have a table provisioned with the suffix 0_0 for v0.0 which is the default version in your package.json. Your Data Model versions are handled by git.

    Defining a new version of the UserActivity Data Models

    The state of your files in your git repository of your moose application corresponds to the latest version of your data model.

    With moose you can have multiple versions of your data model deployed at the same time. That allows you to have loose coupling between your data infrastructure and the other parts of the data ecosystem (such as applications that emit the data, databases you extract data from, etc...)

    When you deploy a new version of your data model to be available for consumers or producers to put data into, you can keep your old version of the data model up and running. A continuous data migration process links together the two versions of a data model.

    Differently said, if data comes into your old model, the sync will bring it to your new model.

    Let's see this in action.

  • Stop the dev server from running (CTRL+C)
  • With your favorite text editor, open the app/datamodels/models.ts file above
  • Delete the line that says activity in the UserActivity data model, this will remove the activity from the data model and create a change.
  • Run the following command
  • Terminal
    $ npx @514labs/moose-cli@latest bump-version

    It will bump the version in your package.json file and add a pointer between the git commit and the previous version to the project.toml file.

    ./project.toml
    [supported_old_versions]
    "0.0" = "263297f"
  • $ npx @514labs/moose-cli@latest generate migrations
  • This command will generate a SQL mapping file in the flows directory:

    ./app/flows/UserActivity_migrate__0_0__0_1.sql
    (eventId, timestamp, userId, activity) -> (eventId, timestamp, userId)

    This is a SQL fragment (opens in a new tab) that gets embedded into two other queries:

  • One that creates a trigger that continuously copies data from the older version to the newer version.
  • One that initially loads data from the older version of the table to the newer version of the table.
  • This represents Moose's best guess at the migration path between versions, but please feel free to edit the SQL mapping file to specify particular mapping, typing, etc.

    This file allows a user to send data to the old data model version and the new data model version, and the data for the old version will continuously be migrated to the new data model.

    Inspect the resulting state

    Let's start by starting dev mode in the project

    Terminal
    $ npx @514labs/moose-cli@latest dev

    Once the CLI is done processing old versions, you will be able to open the dev console at http://localhost:3001/primitives/models/UserActivity?tab=query.

    On this query screen, you will be able to see the additional tables automatically created for you: UserActivity_0_1 is one of them.

    Run the following in your terminal to send some data to v0 of your data model

    Terminal
    $ curl -v -X POST \
        -H "Content-Type: application/json" \
        -d "{\"eventId\": \"1\", \"timestamp\": \"$(date '+%Y-%m-%d %H:%M:%S')\", \"userId\": \"1\", \"activity\": \"click\"}" \
        http://localhost:4000/ingest/UserActivity/0.0

    You will notice that it contains the activity field that we deleted.

    Now in the UI you can query this table by running in the query tab.

    SELECT * FROM local.UserActivity_0_0 LIMIT 50;

    You should see the event we just added there.

    The continuous migration working behind the scenes will allow you to also see the data inside local.UserActivity_0_1

    SELECT * FROM local.UserActivity_0_1 LIMIT 50;

    And now you should see the same record without the activity column. The data was seamlessly migrated to the new table. This is a trivial example, but the same feature holds true for more complex migrations.

    Now if we want to set some data to the new data model version:

    Terminal
    curl -v -X POST \
        -H "Content-Type: application/json" \
        -d "{\"eventId\": \"2\", \"timestamp\": \"$(date '+%Y-%m-%d %H:%M:%S')\", \"userId\": \"2\"}" \
        http://localhost:4000/ingest/UserActivity/0.1

    The data will show up in the 0.2 table but not in the 0.1 table. That enables you to keep old models alive with historical data as you migrate your infrastructure to produce and consume data on the new data model. Once all your data is migrated, you can remove the old version, and we will appropriately clean up the infrastructure.

    To summarize: the Data Change Management feature allows you to version your data infrastructure alongside your Data Models. As you update your Data Models, Moose will automate the creation of data infrastructure, and allow you to run the old data infrastructure alongside the new data infrastructure, keeping the old data flowing through to the new Data Models with migrations. Such migrations are defined by Moose, but you may change the definition as per your requirements. This allows you to treat your data infrastructure and your Data Models as you do your code.