How can I bulk insert with MongoDB (using PyMongo)

2019-09-11 11:02发布

问题:

I have some Python code that uses PyMongo to insert many lists (of 1000 objects each), into a collection with a unique index (field name is data_id).

However, some of my lists of objects have duplicate data in the different sets of lists to be inserted (e.g., perhaps the second list of 1000 objects has one or two records that are identical to some of the objects previously inserted in the first set of the bulk insert).

Here's the problem: when the code goes to bulk insert a set of 1000 objects, and one object has a previously inserted data_id, the entire insert for all 1000 object fails. I am performing the insert as below:

inserted = False
try:
    collection = self.db[self.database][self.collection]
    collection.insert(record)
    inserted = True

except pymongo.errors.ConnectionFailure, e:
    sys.stdout.write('Error connecting to %s: %s\n' % (self.connection_url, e))
except BaseException, e:
    sys.stdout.write('An error occurred in add_record: %s\n' % e)

return inserted

I have read somewhere (and now I can't find the reference anywhere!), that this can be avoided by telling Mongo the list is unordered. So I tried passing the insert line ordered=False, but this fails with:

__init__() got an unexpected keyword argument 'ordered'

Does anyone know how to use PyMongo.insert() to unordered insert a list of objects so that only the non-unique records fail and the rest are inserted as expected?

回答1:

Found the answer. For those interested, .insert() has been deprecated in PyMongo and it is advised to use .insert_many(), which respects the ordered=False keyword.