Updating database from a list of dictionaries

2019-07-15 00:19发布

问题:

In Python, I have a list of dictionaries. The list is called members and each member has a unique id. For example, the list could look like this:

members = [{'id':1, 'val1':10, 'val2':11},
           {'id':2, 'val1':2, 'val2':34},
           {'id':3, 'val1':350, 'val2':9}]

I want to update my collection with the list of members, updating and inserting new entries as necessary.

Do I need to loop through the members, or is there a faster way?

Here's my attempt, which seems to do what I want but takes a while:

for m in members: 
     collection.update_one( {'id':m['id']}, {'$set': m)}, upsert = True)

Please note that this requires updating each db entry with a different value, namely the one corresponding to its id.

回答1:

With modern pymongo you can use .bulk_write() with the ReplaceOne bulk write operation, in your particular case, or an otherwise appropriate operation

from pymongo import MongoClient
from pymongo import ReplaceOne

client = MongoClient()

db = client.test

members = [
  { 'id': 1, 'val1': 10,  'val2': 11 },
  { 'id': 2, 'val1': 2,   'val2': 34 },
  { 'id': 3, 'val1': 350, 'val2': 9  }
]

db.testcol.bulk_write([
  ReplaceOne(
    { "id": m['id'] },
    m,
    upsert=True
  )
  for m in members
])

Ideally you would not be processing from a source "list" and instead reading in some external "stream" to keep memory requirements down. In a similar way you would just build up the operations list argument for say 1000 operations and then calling .bulk_write() to the server for only every 1000.

But the whole point is that with .bulk_write() you are sending your "batch" all at once and with only one response, rather than as separate requests with separate responses, which creates overhead and takes time.

Also using this API method actually uses the "Bulk API" underneath in supported servers, but degrades to making the individual calls for you when the server version does not support the "Bulk" methods.



回答2:

Use update_many if you have the same values.

But if you have different values, you can not update two documents at once with a MongoDB query. You will always have to do that in two queries. You can of course set a value of a field to the same value, or increment with the same number, but you can not do two distinct updates in MongoDB with the same query.