Schema Mapper
Versioned data in any database

Whenever I am not occupied with work, I open up my editor and hack on Schema Mapper.
It is a project that I have become fascinated by as it covers many aspects of programming.

First let me tell you something about what Schema Mapper is (or tries to be)

Schema Mapper

Imagine you are building an API for your product, and you want to store incoming data in multiple databases, for example in PostgreSQL and Elastic(search).
You now have to setup a PostgreSQL database, add the tables to it and setup an Elasticsearch index with the correct mappings.
Now you release a mobile app that uses the API and everybody is happy.
A month later, you add some new features to your app that require some changes to your schema so you make and run a script that migrates the tables and updates the Elasticsearch mapping.
You start up your app, and all you get is a white screen, when you look at the logs you notice that the app is trying to access a property that is not present in the data anymore.
Now you know that you cannot rename or remove columns without possibly breaking stuff, so your roll back the migration and create a new database, elasticsearch index and API endpoint (/api/v2).
Your v2 endpoint stores the data in the new version of the database, transforms the data a bit to make it the same as the v1 data, and store it there as well.
You modify the v1 API endpoint to make it transform the data and insert it into the v2 database after it is stored in the v1 database.
New and old clients work again and everybody is happy.

A different approach

Schema Mapper was designed to make the use case from above as simple as possible.
It uses a single format for describing a schema, and a single format for describing data.
All changes you make to a schema are stored and used for transforming data between versions.
Schema Mapper typically runs on a server, but can also run in the browser.
The only difference is that in the browser the data get's stored in leveldb (using level-js which stores data in IndexedDB).

Server

Let's start a schema mapper server and configure some database drivers.

Installation

npm install -g schema-mapper-store

Usage

schema-mapper-store \
  --port 9090 \
  --driver [ elasticsearch --host http://localhost:9200 ] \
  --driver [ postgresql --host localhost --username postgres --password somepassword ] \
  --driver [ leveldb --path /leveldb ]

We now have a schema mapper store running on port 9090 with an Elasticsearch, Postgresql and LevelDB driver enabled.

Browser

Installation

npm install -g schema-mapper-store babel-polyfill

Usage

// import dependencies
import 'babel-polyfill'
import LocalStore from 'schema-mapper-store/local'
import RemoteStore from 'schema-mapper-store/remote'

// setup stores
var remoteStore = RemoteStore.create('http://localhost:9090')
var store = LocalStore.create()

// sync local store with remote store
store.sync(remoteStore)

We now have setup a store in the browser, and a connection to a remote store (running on localhost). We also tell the local store to sync with the remote store.

Before we can start entering data, we have to create a project, which goes like this:

// create a project
store.request({
  command: 'change.apply', // the change.apply command is used to make modifications to the schema
  data: { // this is the actual change object
    change: 'project.put', // this is the change we want to apply (create or update a project)
    projectId: '1', // this is the unique identifier for the project
    project: { // this is the project object
      name: 'helloworld', // this is the name for the project, which will be used for the "database" name
      version: 0, // the version of the project, 0 means "staging"
      drivers: ['elasticsearch'], // this tells the store which drivers to use when items are stored in the project
      schemas: { // this object describes the schemas for this project, the key is the unique id for the schema
        '1': {
          name: 'users', // this is the name for the schema, which will be used for the database "table" name
          primary: '1', // this tells the store which column is the primary key, it refers to the unique column id
          columns: { // these are the columns for the schema, the key is the unique id for the column
            '1': {
              name: 'id', // this is the name for the column
              type: 'uuid' // this is the column type
            }
          }
        }
      }
    }
  }
})

Here is what happened:

Once we are happy with the design of the project, we can tag it (increase the version). When done, the database(s) will be created, and the store is ready to accept data. Here is how you tag a project:

// tag the project
store.request({
  command: 'change.apply', // again we are going to make a modification to the schema, so we use the change.apply command
  data: { // the change object
    change: 'project.tag', // change we want to apply (tag the project)
    projectId: '1', // the project id to tag
    version: 1 // the version to tag it to
  }
})

Here is what happened:

Now that the project is at version 1 and the databases are present, we can insert some data.

// create an item
store.request({
  command: 'item.create',
  data: {
    projectId: '1',
    projectVersion: 1,
    schemaId: '1',
    item: {
      id: '277f2c8e-2093-4f67-ab50-e3f295d40249'
    }
  }
})

Here is what happened:

This is still basic stuff, but it get's more interesting when we make some changes to our schema and tag the project again. To change the schema, we simply apply a 'project.put' change again.

// update a project
store.request({
  command: 'change.apply',
  data: {
    change: 'project.put',
    projectId: '1',
    project: {
      name: 'hello', // this was helloworld before
      version: 1, // note that it is now at version 1
      drivers: ['elasticsearch'],
      schemas: {
        '1': {
          name: 'members', // this was users before
          primary: '1',
          columns: {
            '1': {
              name: 'member_id', // this was id before
              type: 'uuid'
            },
            '2': { // this is a new column
              name: 'email',
              type: 'string'
            }
          }
        }
      }
    }
  }
})

Here is what happened:

Still not so exciting, but when we tag the project to version 2, magic starts happening

// tag the project
store.request({
  command: 'change.apply',
  data: {
    change: 'project.tag',
    projectId: '1',
    version: 2 // the new version
  }
})

Here is what happened:

We now have 2 databases, helloworld@1 and hello@2, both containing the same amount of items (=rows/documents), but with a different schema. So what happens when we store an item again? Lets try

// create an item
store.request({
  command: 'item.create',
  data: {
    projectId: '1',
    projectVersion: 2,
    schemaId: '1',
    item: {
      member_id: '9159b338-260f-4508-9a2b-545f7a506c1e',
      email: 'hello@koenschmeets.nl'
    }
  }
})

Here is what happened:

Pretty neat huh?

Before you start using this, note the dragons:

Limitations

NOTE: Schema Mapper is still in development and not ready for production use!!

It has some benefits but only gets these by adding a lot of limitations.
  • The only way to write to your databases (while staying consistent) is through Schema Mapper.
  • Transactions are not supported
  • Some types cannot be mapped uniformly, and you are stuck with what Schema Mapper thinks is best (per driver options will partially solve this)
  • Some specific database functionality will be unusable (unless you know what you are doing), think:
    • Cascades
    • Triggers
  • Schemaless DB's still need a schema to work (for transforming data between versions)

Did I mention Schema Mapper is Free and Open-Source Software? I welcome contributors!

Schema Mapper Github