CouchDB versioning strategy

2020-02-17 05:55发布

问题:

Would the following be a viable strategy for implementing versioning(using "example" as a sample document type):

Have one original document where the type field is named example_original.

Subsequent changes to the document all have type example_change and the id of example_original document as a key. The change would also carry a timestamp.

Keep one doc with type example_current that is the result of example_original with all example_change "applied". A new example_change document would automatically be applied to this document.

Finding a specific version would consist in retrieving the example_original doc and applying the desired changes (mostly up to a certain timestamp, but it could also be a number of changes).

I should mention that my use-case will involve a limited number of changes to the original. Most updates will consist of new original documents. While this is my current use-case I would also be interested in issues that would result if many changes where involved.

What pros and cons do you see in this approach?

回答1:

My first worry is: When "getting" a certain version, can you apply the changes to the original without modifying the database?

Will you ever need to delete something from the history? Are you really sure? Really, really sure? How about branches?

All in all, this looks like a complex strategy. Keep in mind that I've heard about CouchDB but never used it. I'd go for a more simple approach:

  1. When you create a document, you assign a UUID. Don't use the name or you'll run into trouble during rename operations. Add a version field that reads "1". Create a second document which contains a list of documents with the same UUID or add a "parent" pointer to the first document.

    Having a "history document" per document allows for faster navigation of the history but parent pointers are more "safe" (since you can't easily create illegal structures with them).

  2. When you create a new revision, reuse the UUID and assign a new, unique version. Update the history document or the parent pointer.

This strategy is pretty simple to implement and allows all kinds of flexibility later. You can erase parts of the history easily, rename is simple, and you can create branches.



回答2:

Simple Document Versioning with CouchDB

The versioning as attachments approach described in this article should fit most people's requirements for versioning.



回答3:

What is the business status of these documents, especially legal? I have worked in situations where your proposal would not be appropriate from a business persepctive, because of a need to prove that the document presented as v.3 really is version 3 of the document. Dynamically applying deltas would not cut the compliance mustard.

If, as you say, changes to documents ae infrequent, then you are not going to be saving much disk space by storing deltas instead of whole documents. Storing whole documents also allows for the reliable prediction of the retrieval time for any document. It also reduces the complexity of the retrieval process.



回答4:

A strategy for versioning with CouchDB is to NOT ever compact the database which contains the documents for which you need to keep a full history. You could still compact other databases. This simple strategy works today out of the box with an edit conflict resolution strategy.

Deleting a document could be done by writing a new version with no content but a deleted property set.

Branches cannot be done this way because the versioning mechanism offers a single thread of revisions.

Now for the possible future of CouchDB:

  • Today each revision holds a full copy of the document but one could think that optimizations of the CouchDB engine could one day store deltas.
  • It is also possible that in the future CouchDB would offer an API to prevent the compaction of certain document types. This would allow to keep all the documents in the same database. This would be an easy patch to CouchDB.
  • This strategy does enable the management of document branches but considering the nature of CouchDB as a document database, this is something of a reasonable, yet long term, possibility.