Are old data accessible in CouchDB?

2019-04-04 15:03发布

I've read a bit about CouchDB and I'm really intrigued by the fact that it's "append-only". I may be misunderstanding that, but as I understand it, it works a bit like this:

  • data is added at time t0 to the DB telling that a user with ID 1's name is "Cedrik Martin"

  • a query asking "what is the name of the user with ID 1?" returns "Cedrik Martin"

  • at time t1 an update is made to the DB telling: "User with ID 1's name is Cedric Martin" (changing the 'k' to a 'c').

  • a query asking again "what is the name of the user with ID 1" now returns "Cedric Martin"

It's a silly example, but it's because I'd like to understand something fundamental about CouchDB.

Seen that the update has been made using an append at the end of the DB, is it possible to query the DB "as it was at time t0", without doing anything special?

Can I ask CouchDB "What was the name of the user with ID 1 at time t0?" ?

EDIT the first answer is very interesting and so I've got a more precise question: as long as I'm not "compacting" a CouchDB, I can write queries that are somehow "referentially transparent" (i.e. they'll always produce the same result)? For example if I query for "document d at revision r", am I guaranteed to always get the same answer back as long as I'm not compacting the DB?

4条回答
2楼-- · 2019-04-04 15:08

Perhaps the most common mistake made with CouchDB is to believe it provides a versioning system for your data. It does not.

Compaction removes all non-latest revisions of all documents and replication only replicates the latest revisions of any document. If you need historical versions, you must preserve them in your latest revision using any scheme that seems good to you.

"_rev" is, as noted, an unfortunate name, but no other word has been suggested that is any clearer. "_mvcc" and "_mcvv_token" have been suggested before. The issue with both is that any description of what's going on there will inevitably include the "old versions remain on disk until compaction" which will still imply that it's a user versioning system.

To answer the question "Can I ask CouchDB "What was the name of the user with ID 1 at time t0?" ?", the short answer is "NO". The long answer is "YES, but then later it won't work", which is just another way of saying "NO". :)

查看更多
闹够了就滚
3楼-- · 2019-04-04 15:16

As already said, it is technically possible and you shouldn't count on it. It isn't only about compaction, it's also about replication, one of CouchDB's biggest strengths. But yes, if you never compact and if you don't replicate, then you will be able to always fetch all previous versions of all documents. I think it will not work with queries, though, they can't work with older versions.

Basically, calling it "rev" was the biggest mistake in CouchDB's design, it should have been called "mvcc_token" or something like that -- it really only implements MVCC, it isn't meant to be used for versioning.

查看更多
神经病院院长
4楼-- · 2019-04-04 15:24

t0(t1...) is in couchdb called "revision". Each time you change a document, the revision-number increases. The docs old revisions are stored until you don't want to have old revisions anymore, and tell the database "compact". Look at "Accessing Previous Revisions" in http://wiki.apache.org/couchdb/HTTP_Document_API

查看更多
时光不老,我们不散
5楼-- · 2019-04-04 15:25

Answer to the second Question: YES.

Changed Data is always Added to the tree with a higher revision number. same rev is never changed.

For Your Info:

The revision (1-abcdef) ist built that way: 1=Number of Version ( here: first version), second is a hash over the document-content (not sure, if there is some more "salt" in there)... so the same doc content will always produce the same revision number ( with the same setup of couchdb) even on other machines, when on the same changing-level ( 1-, 2-, 3-)

Another way is: if you need to keep old versions, you can store documents inside a bigger doc:

{
 id:"docHistoryContainer_5374",
 "doc_id":"5374",
 "versions":[
   {"v":1,
    "date":[2012,03,15],
    "doc":{ .... doc_content v1....}
   },
   {"v":2,
    "date":[2012,03,16],
    "doc":{ .... doc_content v2....}
   }
 ]
}

then you can ask for revisions:

View "byRev":

for (var curRev in doc.versions) {
  map([doc.doc_id,doc.versions[curRev].v],doc.versions[curRev]);
}

call:

/byRev?startkey=["5374"]&endkey=["5374",{}]

result:

{ id:"docHistoryContainer_5374",key=[5374,1]value={...doc_content v1 ....} } { id:"docHistoryContainer_5374",key=[5374,2]value={...doc_content v2 ....} }

Additionaly you now can write also a map-function that amits the date in the key, so you can ask for revisions in a date-range

查看更多
登录 后发表回答