How to save 1 million records to mongodb asyncrono

2019-07-17 05:33发布

问题:

I want to save 1 million records to mongodb using javascript like this:

for (var i = 0; i<10000000; i++) {
  model = buildModel(i);
  db.save(model, function(err, done) {
    console.log('cool');
  });
}

I tried it, it saved ~160 records, then hang for 2 minutes, then exited. Why?

回答1:

It blew up because you are not waiting for an asynchronous call to complete before moving on to the next iteration. What this means is that you are building a "stack" of unresolved operations until this causes a problem. What is the name of this site again? Get the picture?

So this is not the best way to proceed with "Bulk" insertions. Fortunately the underlying MongoDB driver has already thought about this, aside from the callback issue mentioned earlier. There is in fact a "Bulk API" available to make this a whole lot better. And assuming you already pulled the native driver as the db object. But I prefer just using the .collection accessor from the model, and the "async" module to make everything clear:

var bulk = Model.collection.initializeOrderedBulkOp();
var counter = 0;

async.whilst(
  // Iterator condition
  function() { return count < 1000000 },

  // Do this in the iterator
  function(callback) {
    counter++;
    var model = buildModel(counter);
    bulk.insert(model);

    if ( counter % 1000 == 0 ) {
      bulk.execute(function(err,result) {
        bulk = Model.collection.initializeOrderedBulkOp();
        callback(err);
      });
    } else {
      callback();
    }
  },

  // When all is done
  function(err) {
    if ( counter % 1000 != 0 ) 
        bulk.execute(function(err,result) {
           console.log( "inserted some more" );
        });        
    console.log( "I'm finished now" ;
  }
);

The difference there is using both "asynchronous" callback methods on completion rather that just building up a stack, but also employing the "Bulk Operations API" in order to mitigate the asynchronous write calls by submitting everything in batch update statements of 1000 entries.

This does not not only not "build up a stack" of function execution like your own example code, but also performs efficient "wire" transactions by not sending everything all in individual statements, but rather breaking up into manageable "batches" for server commitment.



回答2:

You should probably use something like Async's eachLimit:

// Create a array of numbers 0-999999
var models = new Array(1000000);
for (var i = models.length; i >= 0; i--)
  models[i] = i;

// Iterate over the array performing a MongoDB save operation for each item
// while never performing more than 20 parallel saves at the same time
async.eachLimit(models, 20, function iterator(model, next){
  // Build a model and save it to the DB, call next when finished
  db.save(buildModel(model), next);
}, function done(err, results){
  if (err) { // When an error has occurred while trying to save any model to the DB
    console.error(err);
  } else { // When all 1,000,000 models have been saved to the DB
    console.log('Successfully saved ' + results.length + ' models to MongoDB.');
  }
});