What is the optimum bulk item count with InsertBat

2019-06-18 21:40发布

问题:

I heard that large batch sizes don't really give any additional performance

what is the optimum?

回答1:

If you call Insert to insert documents one at a time there is a network round trip for each document. If you call InsertBatch to insert documents in batches there is a network round trip for each batch instead of for each document. InsertBatch is more efficient than Insert because it reduces the number of network round trips.

Suppose you had to insert 1,000,000 documents, you could analyze the number of network round trips for different batch sizes:

  • batch size 1: 1,000,000 round trips
  • batch size 10: 100,000 round trips
  • batch size 100: 10,000 round trips
  • batch size 1000, 1000 round trips
  • etc...

So you see that even a batch size as small as 10 has already eliminated 90% of the network round trips, and a batch size of 100 has eliminated 99% of the network round trips.

This is a somewhat simplified analysis because it ignores the fact that as the batch sizes increase so do the message sizes, but it's more or less accurate.

I don't think that there is any one optimum batch size. I would say that larger batches are more performant, but once you have 10-100 documents per batch there will be very small performance improvements with larger batches.