I’m digging deeper into CouchDB 2 and I’m finding some unexpected ordering with sequence numbers. In one case, I found that an early change in a _changes feed has the sequence number
99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk
The last sequence number in my _changes feed, for the same DB, is
228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE
In a browser console, the following is false
'228-g1AAAAJFeJyd0EkOgjAUBuAGTJCdN9AjlIKFruQm2jFAEFes9SZ6E72J3gQ7JW7QCGnyXtLhy-vfAgCWVSjAip96XglW-o5afRJQwNbDMDRVSOuj3ogQJRgiOnL_O8I2urKdd4B1KCRpkRcCxH0npKo7KX4ApQH2HogsAElOKOPTBjkY5-yd2DqKYqnItA91C13BRTdNXY0VWouRrV7JDOvmrLuxlLW4VAlJ5Qzr4aznJ2wskIIy-y9sh7wcYoMKLJKRXOACjTxr3uHcsBE' > '99-g1AAAAI-eJyd0EsOgjAQBuAGiI-dN9C9LmrBwqzkJtrSNkgQV6z1JnoTvYneBEvbhA0aMU1mkj6-_NMSITTJfYFm2anOcsFT10mpTzyG-LxpmiL32eqoN8aEAcWE9dz_jPCFrnzrHGQchiFM4kSgaV0JqQ6VFF-AtAV2DggMgCEGxrNhQfatc3bOyDiKUalg2EBVoCu66KapazcUh41e69-GssjNIvcWWRokk2oNofwj0MNazy4QFURhGQ0J9LKI-SHPIBHEgiak51nxBhxnrRk'
Is this a bug or do I need to use some other method to compare sequence numbers?
In looking at the other sequence numbers in my _changes feed, it looks like they are generally ordered as I would expect, but in this case it appears that when the first number, e.g. 99, jumps from 2 digits to 3 digits, the ordering breaks. If you boil this down to a simple string comparison example, you can see that '228' > '99' => false
The following answer contains excerpts from an email thread with @rnewson. I hope it helps someone else to understand sequence numbers in CouchDB 2. Thanks, Robert!
The background:
There's no easy way to compare them in 2.0 and no requirement for them
to be in order. They are not, in short, designed to be examined or
compared outside of couchdb; treat them opaquely.
The number on the front is the sum of the individual update sequences
encoded in the second part and exists only to trick older versions of
the couchdb replicator into making checkpoints.
The latter half of the sequence string is an encoded list of {node,
range, seq} tuples (where seq is the integer value you know from
pre-2.0 releases). When a sequence string is passed back in, as the
since= parameter, couchdb decodes this string and passes the
appropriate integer seq value to the individual shard.
All that said, in general the front number should increase. The full
strings themselves are not comparable, since there's no defined order
to the encoded list (so two strings could be generated that are
encoded differently but decode to the same list of tuples, just in a
different order).
Another aspect to this is that the changes feed is not totally
ordered. For a given shard it is totally ordered (a shard being
identical to a pre 2.0 database with an integer sequence), couchdb
doesn't shuffle that output (though correctness of replication would
be retained if it did). A clustered database is comprised of several
shards, though (the 'q' value, defaulting to 4 iirc). The clustered
changes feed combines those separate changes feed into a single one,
but makes no effort to impose a total order over that. We don't do it
because it would be expensive and unnecessary.
The solution if you need to listen on a _changes feed and then restart
from where you left off later:
The algorithm for correctly consuming the changes feed is:
- read /dbname/_changes
- process each row idempotently
- periodically (every X seconds or every X rows) store the "seq" value of the last row you processed
If you ever crash, or if you weren't using continuous=true, you can do
this same procedure again but modified in step 1;
revised 1. read /dbname/_changes?since=X
where X is the value you saved in step 3. If you're not using
continuous mode then you could just record the "last_seq" value at the
end of consuming the non-continuous response. You run the risk of
reprocessing a lot more items, though.
With this scheme (which the replicator and all indexers follow), you
don't care if the results come out of order, you don't need to compare
any two seq values.
You do need to ensure you can correctly process the same change
multiple times. For an example of that, consider the replicator, when
it sees a row from a changes feed it asks the target database if it
contains the _id and _rev values from that row. If it does, the
replicator moves on to the next row. If it doesn't, it tries to write
the document in that row to the target database. In the event of a
crash, and therefore a call to _changes with a seq value from before
processing that row, it will ask the target database if it has the
_id/_rev again, only this time the target will say yes.