How to delete documents by query efficiently in mo

2019-01-07 21:48发布

问题:

I have a query, which selects documents to be removed. Right now, I remove them manually, like this (using python):

for id in mycoll.find(query, fields={}):
  mycoll.remove(id)

This does not seem to be very efficient. Is there a better way?

EDIT

OK, I owe an apology for forgetting to mention the query details, because it matters. Here is the complete python code:

def reduce_duplicates(mydb, max_group_size):
  # 1. Count the group sizes
  res = mydb.static.map_reduce(jstrMeasureGroupMap, jstrMeasureGroupReduce, 'filter_scratch', full_response = True)
  # 2. For each entry from the filter scratch collection having count > max_group_size
  deleteFindArgs = {'fields': {}, 'sort': [('test_date', ASCENDING)]}
  for entry in mydb.filter_scratch.find({'value': {'$gt': max_group_size}}):
    key = entry['_id']
    group_size = int(entry['value'])
    # 2b. query the original collection by the entry key, order it by test_date ascending, limit to the group size minus max_group_size.
    for id in mydb.static.find(key, limit = group_size - max_group_size, **deleteFindArgs):
      mydb.static.remove(id)
  return res['counts']['input']

So, what does it do? It reduces the number of duplicate keys to at most max_group_size per key value, leaving only the newest records. It works like this:

  1. MR the data to (key, count) pairs.
  2. Iterate over all the pairs with count > max_group_size
  3. Query the data by key, while sorting it ascending by the timestamp (the oldest first) and limiting the result to the count - max_group_size oldest records
  4. Delete each and every found record.

As you can see, this accomplishes the task of reducing the duplicates to at most N newest records. So, the last two steps are foreach-found-remove and this is the important detail of my question, that changes everything and I had to be more specific about it - sorry.

Now, about the collection remove command. It does accept query, but mine include sorting and limiting. Can I do it with remove? Well, I have tried:

mydb.static.find(key, limit = group_size - max_group_size, sort=[('test_date', ASCENDING)])

This attempt fails miserably. Moreover, it seems to screw mongo.Observe:

C:\dev\poc\SDR>python FilterOoklaData.py
bad offset:0 accessing file: /data/db/ookla.0 - consider repairing database

Needless to say, that the foreach-found-remove approach works and yields the expected results.

Now, I hope I have provided enough context and (hopefully) have restored my lost honour.

回答1:

You can use a query to remove all matching documents

var query = {name: 'John'};
db.collection.remove(query);

Be wary, though, if number of matching documents is high, your database might get less responsive. It is often advised to delete documents in smaller chunks.

Let's say, you have 100k documents to delete from a collection. It is better to execute 100 queries that delete 1k documents each than 1 query that deletes all 100k documents.



回答2:

You can remove it directly using MongoDB scripting language:

db.mycoll.remove({_id:'your_id_here'});


回答3:

Would deleteMany() be more efficient? I've recently found that remove() is quite slow for 6m documents in a 100m doc collection. Documentation at (https://docs.mongodb.com/manual/reference/method/db.collection.deleteMany)

db.collection.deleteMany(
   <filter>,
   {
      writeConcern: <document>,
      collation: <document>
   }
)


回答4:

Run this query in cmd

db.users.remove( {"_id": ObjectId("5a5f1c472ce1070e11fde4af")});

If you are using node.js write this code

User.remove({ _id: req.body.id },, function(err){...});


标签: mongodb