According to MongoDB documentation:
Bulk Operation Size
A bulk operation can have at most 1000 operations.
However, I was able to use bulk with much larger operations count (around 300k operations) using Mongo 2.6 bulk operations API with node-mongodb-native (collection.initializeUnorderedBulkOp()
etc.)
Is this limit outdated, or am I just missing something? Do you know what is the real limit?
I opened a ticket in MongoDB's Jira. They replied that:
You're correct; this limit needs some clarification in the documentation. The limit is on the server, but client drivers hide the limit from application developers by splitting bulk operations into multiple batches.
That is an interesting statement, and it is new to the documentation as of the 2.6 release so you will see that this was not present in the section you reference for earlier releases.
Of course the real limit is the 16MB BSON limit, as that would the maximum size of what can be sent over the wire as what is effectively one BSON document. That becomes clearer when you realize this is a conveinience API over the top of things like the runCommand form of update as shown there for "Bulk Updates", or otherwise inserts which clearly can just take this form.
The way I would usually write this up as an example would check the modlulo of the current iteration when adding bulk operations and only "execute" every so often. Not the exact syntax for the node driver, but basically:
var bulk = db.collection.initializeUnorderedBulkOp();
counter = 0;
longArrayOrStream.forEach(function(doc) {
bulk.find({ "_id": doc._id }).update(
{ "$set": { "somefield": doc.somefield }});
counter++;
if ( counter % 500 == 0 ) {
bulk.execute();
counter = 0;
bulk = db.collection.initializeUnorderedBulkOp();
}
});
if ( counter > 0 )
bulk.execute();
Or something similar depending on what you are doing. So that would be significantly less than the volumes you are using, but essentially in manageable chunks that are not too big over the network and are certainly safely under 16MB.
So the BSON limit is the absolute hard limit, but for practical reasons and also considering that you may well want to check for the error status that you would also receive in one big document in response, you probably want to keep these in smaller chunks.
It's all better than doing it one operation at a time, and I don't know if I really would want to send right up to 16MB over the wire at once and/or check a 16MB response for possible errors.