Sequence number bug in CouchDB 2 or is there anoth

2019-08-19 15:45发布

问题:

I’m digging deeper into CouchDB 2 and I’m finding some unexpected ordering with sequence numbers. In one case, I found that an early change in a _changes feed has the sequence number

99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk

The last sequence number in my _changes feed, for the same DB, is

228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE

In a browser console, the following is false

'228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE' > '99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk'

Is this a bug or do I need to use some other method to compare sequence numbers?

In looking at the other sequence numbers in my _changes feed, it looks like they are generally ordered as I would expect, but in this case it appears that when the first number, e.g. 99, jumps from 2 digits to 3 digits, the ordering breaks. If you boil this down to a simple string comparison example, you can see that '228' > '99' => false

回答1:

The following answer contains excerpts from an email thread with @rnewson. I hope it helps someone else to understand sequence numbers in CouchDB 2. Thanks, Robert!

The background:

There's no easy way to compare them in 2.0 and no requirement for them to be in order. They are not, in short, designed to be examined or compared outside of couchdb; treat them opaquely.

The number on the front is the sum of the individual update sequences encoded in the second part and exists only to trick older versions of the couchdb replicator into making checkpoints.

The latter half of the sequence string is an encoded list of {node, range, seq} tuples (where seq is the integer value you know from pre-2.0 releases). When a sequence string is passed back in, as the since= parameter, couchdb decodes this string and passes the appropriate integer seq value to the individual shard.

All that said, in general the front number should increase. The full strings themselves are not comparable, since there's no defined order to the encoded list (so two strings could be generated that are encoded differently but decode to the same list of tuples, just in a different order).

Another aspect to this is that the changes feed is not totally ordered. For a given shard it is totally ordered (a shard being identical to a pre 2.0 database with an integer sequence), couchdb doesn't shuffle that output (though correctness of replication would be retained if it did). A clustered database is comprised of several shards, though (the 'q' value, defaulting to 4 iirc). The clustered changes feed combines those separate changes feed into a single one, but makes no effort to impose a total order over that. We don't do it because it would be expensive and unnecessary.

The solution if you need to listen on a _changes feed and then restart from where you left off later:

The algorithm for correctly consuming the changes feed is:

  1. read /dbname/_changes
  2. process each row idempotently
  3. periodically (every X seconds or every X rows) store the "seq" value of the last row you processed

If you ever crash, or if you weren't using continuous=true, you can do this same procedure again but modified in step 1;

revised 1. read /dbname/_changes?since=X

where X is the value you saved in step 3. If you're not using continuous mode then you could just record the "last_seq" value at the end of consuming the non-continuous response. You run the risk of reprocessing a lot more items, though.

With this scheme (which the replicator and all indexers follow), you don't care if the results come out of order, you don't need to compare any two seq values.

You do need to ensure you can correctly process the same change multiple times. For an example of that, consider the replicator, when it sees a row from a changes feed it asks the target database if it contains the _id and _rev values from that row. If it does, the replicator moves on to the next row. If it doesn't, it tries to write the document in that row to the target database. In the event of a crash, and therefore a call to _changes with a seq value from before processing that row, it will ask the target database if it has the _id/_rev again, only this time the target will say yes.