Incorrect Count returned by MongoDB (WiredTiger)

2019-03-25 05:44发布

问题:

This sounds odd, and I hope I am doing something wrong, but my MongoDB collection is returning the Count off by one in my collection.

I have a collection with (I am sure) 359671 documents. However the count() command returns 359670 documents.

I am executing the count() command using the mongo shell:

rs0:PRIMARY> db.COLLECTION.count()
359670

This is incorrect.

It is not finding each and every document in my collection.

If I provide the following query to count, I get the correct result:

rs0:PRIMARY> db.COLLECTION.count({_id: {$exists: true}})
359671

I believe this is a bug in WiredTiger. As far as I am aware each document has the same definition, an _id field of an integer ranging from 0 to 359670, and a BinData field. I did not have this problem with the older storage engine (or Mongo 2, either could have caused the issue).

Is this something I have done wrong? I do not want to use the {_id: {$exists: true}} query as that takes 100x longer to complete.

回答1:

According to this issue, this behaviour can occur if mongodb experiences a hard crash and is not shut down gracefully. If not issuing any query, mongodb probably just falls back to the collected statistics.

According to the article, calling db.COLLECTION.validate(true) should reset the counters.



回答2:

As now stated in the doc, db.collection.count() without using a query parameter, returns results based on the collection’s metadata:

This may result in an approximate count. In particular:

  • On a sharded cluster, the resulting count will not correctly filter out orphaned documents.

  • After an unclean shutdown, the count may be incorrect.

When using a query parameter, as you did in the second query ({_id: {$exists: true}}), then it forces count to not use the collection's metadata, but to scan the collection instead.


Starting Mongo 4.0.3, count() is considered deprecated and the following alternatives are recommended instead:

  • Exact count of douments:
db.collection.countDocuments({})

which under the hood actually performs the following "expensive", but accurate aggregation (expensive since the whole collection is scanned to count records):

db.collection.aggregate([{ $group: { _id: null, n: { $sum: 1 } } }])
  • Approximate count of documents:
db.collection.estimatedDocumentCount()

which performs exactly what db.collection.count() does/did (it's actually a wrapper around count), which uses the collection’s metadata.

This is thus almost instantaneous, but may lead to an approximate result in the particular cases mentioned above.