Ways to implement data versioning in MongoDB

2019-01-02 16:08发布

Can you share your thoughts how would you implement data versioning in MongoDB. (I've asked similar question regarding Cassandra. If you have any thoughts which db is better for that please share)

Suppose that I need to version records in an simple address book. (Address book records are stored as flat json objects). I expect that the history:

  • will be used infrequently
  • will be used all at once to present it in a "time machine" fashion
  • there won't be more versions than few hundred to a single record. history won't expire.

I'm considering the following approaches:

  • Create a new object collection to store history of records or changes to the records. It would store one object per version with a reference to the address book entry. Such records would looks as follows:

    {
     '_id': 'new id',
     'user': user_id,
     'timestamp': timestamp,
     'address_book_id': 'id of the address book record' 
     'old_record': {'first_name': 'Jon', 'last_name':'Doe' ...}
    }
    

    This approach can be modified to store an array of versions per document. But this seems to be slower approach without any advantages.

  • Store versions as serialized (JSON) object attached to address book entries. I'm not sure how to attach such objects to MongoDB documents. Perhaps as an array of strings. (Modelled after Simple Document Versioning with CouchDB)

9条回答
时光乱了年华
2楼-- · 2019-01-02 16:49

I worked through this solution that accommodates a published, draft and historical versions of the data:

{
  published: {},
  draft: {},
  history: {
    "1" : {
      metadata: <value>,
      document: {}
    },
    ...
  }
}

I explain the model further here: http://software.danielwatrous.com/representing-revision-data-in-mongodb/

For those that may implement something like this in Java, here's an example:

http://software.danielwatrous.com/using-java-to-work-with-versioned-data/

Including all the code that you can fork, if you like

https://github.com/dwatrous/mongodb-revision-objects

查看更多
不流泪的眼
3楼-- · 2019-01-02 16:50

There is a versioning scheme called "Vermongo" which addresses some aspects which haven't been dealt with in the other replies.

One of these issues is concurrent updates, another one is deleting documents.

Vermongo stores complete document copies in a shadow collection. For some use cases this might cause too much overhead, but I think it also simplifies many things.

https://github.com/thiloplanz/v7files/wiki/Vermongo

查看更多
伤终究还是伤i
4楼-- · 2019-01-02 16:50

If you are using mongoose, I have found the following plugin to be a useful implementation of the JSON Patch format

mongoose-patch-history

查看更多
还给你的自由
5楼-- · 2019-01-02 16:53

Another option is to use mongoose-history plugin.

let mongoose = require('mongoose');
let mongooseHistory = require('mongoose-history');
let Schema = mongoose.Schema;

let MySchema = Post = new Schema({
    title: String,
    status: Boolean
});

MySchema.plugin(mongooseHistory);
// The plugin will automatically create a new collection with the schema name + "_history".
// In this case, collection with name "my_schema_history" will be created.
查看更多
人间绝色
6楼-- · 2019-01-02 16:54

The first big question when diving in to this is "how do you want to store changesets"?

  1. Diffs?
  2. Whole record copies?

My personal approach would be to store diffs. Because the display of these diffs is really a special action, I would put the diffs in a different "history" collection.

I would use the different collection to save memory space. You generally don't want a full history for a simple query. So by keeping the history out of the object you can also keep it out of the commonly accessed memory when that data is queried.

To make my life easy, I would make a history document contain a dictionary of time-stamped diffs. Something like this:

{
    _id : "id of address book record",
    changes : { 
                1234567 : { "city" : "Omaha", "state" : "Nebraska" },
                1234568 : { "city" : "Kansas City", "state" : "Missouri" }
               }
}

To make my life really easy, I would make this part of my DataObjects (EntityWrapper, whatever) that I use to access my data. Generally these objects have some form of history, so that you can easily override the save() method to make this change at the same time.

UPDATE: 2015-10

It looks like there is now a spec for handling JSON diffs. This seems like a more robust way to store the diffs / changes.

查看更多
琉璃瓶的回忆
7楼-- · 2019-01-02 16:58

Try using Javers. Good library.

查看更多
登录 后发表回答