Let us have a MongoDB
collection which has three docs..
db.collection.find()
{ _id:'...', user: 'A', title: 'Physics', Bank: 'Bank_A' }
{ _id:'...', user: 'A', title: 'Chemistry', Bank: 'Bank_B' }
{ _id:'...', user: 'B', title: 'Chemistry', Bank: 'Bank_A' }
We have a doc,
doc = { user: 'B', title: 'Chemistry', Bank:'Bank_A' }
If we use
db.collection.insert(doc)
here, this duplicate doc will get inserted in database.
{ _id:'...', user: 'A', title: 'Physics', Bank: 'Bank_A' }
{ _id:'...', user: 'A', title: 'Chemistry', Bank: 'Bank_B' }
{ _id:'...', user: 'B', title: 'Chemistry', Bank: 'Bank_A' }
{ _id:'...', user: 'B', title: 'Chemistry', Bank: 'Bank_A' }
How this duplicate can be stopped. On which field should indexing be done or any other approach?
It has been updated from the above answers.
please use
db.collection.updateOne()
instead ofdb.collection.update()
. and alsodb.collection.createIndexes()
instead ofdb.collection.ensureIndex()
Update: the methods update() and ensureIndex() has been deprecated from mongodb 2.*, you can see more details in mongo and the path is
./mongodb/lib/collection.js
. Forupdate()
, the recommend methods areupdateOne, updateMany, or bulkWrite
. ForensureIndex()
, the recommend method iscreateIndexes
.Don't use insert.
Use update with
upsert=true
. Update will look for the document that matches your query, then it will modify the fields you want and then, you can tell it upsert:True if you want to insert if no document matches your query.So, for your example, you could use something like this:
You should use a compound index on the set of fields that uniquely identify a document within your MongoDB collection. For example, if you decide that the combination of user, title and Bank are your unique key you would issue the following command:
Please note that this should be done after you have removed previously stored duplicates.
http://docs.mongodb.org/manual/tutorial/create-a-compound-index/
http://docs.mongodb.org/manual/tutorial/create-a-unique-index/