How to remove duplicate entries from an array?
In below example "Algorithms in C++" is added twice.
$unset modifier removes a particular field but how to remove an entry from a field?
> db.users.find()
{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),
"favorites" : { "books" : [ "Algorithms in C++",
"The Art of Computer Programmning",
"Graph Theory",
"Algorithms in C++" ] },
"name" : "robert" }
What you have to do is use map reduce to detect and count duplicate tags .. then use $set
to replace the entire books based on { "_id" : ObjectId("4f6cd3c47156522f4f45b26f"),
This has been discussed sevel times here .. please seee
Removing duplicate records using MapReduce
Fast way to find duplicates on indexed column in mongodb
http://csanz.posterous.com/look-for-duplicates-using-mongodb-mapreduce
http://www.mongodb.org/display/DOCS/MapReduce
How to remove duplicate record in MongoDB by MapReduce?
As of MongoDB 2.2 you can use the aggregation framework with an $unwind
, $group
and $project
stage to achieve this:
db.users.aggregate([{$unwind: '$favorites.books'},
{$group: {_id: '$_id',
books: {$addToSet: '$favorites.books'},
name: {$first: '$name'}}},
{$project: {'favorites.books': '$books', name: '$name'}}
])
Note the need for the $project
to rename the favorites
field, since $group
aggregate fields cannot be nested.
The easiest solution is to use setUnion (Mongo 2.6+):
db.users.aggregate([
{'$addFields': {'favorites.books': {'$setUnion': ['$favorites.books', []]}}}
])
Another (more lengthy) version that is based on the idea from @kynan's answer, but preserves all the other fields without explicitly specifying them (Mongo 3.4+):
> db.users.aggregate([
{'$unwind': {
'path': '$favorites.books',
// output the document even if its list of books is empty
'preserveNullAndEmptyArrays': true
}},
{'$group': {
'_id': '$_id',
'books': {'$addToSet': '$favorites.books'},
// arbitrary name that doesn't exist on any document
'_other_fields': {'$first': '$$ROOT'},
}},
{
// the field, in the resulting document, has the value from the last document merged for the field. (c) docs
// so the new deduped array value will be used
'$replaceRoot': {'newRoot': {'$mergeObjects': ['$_other_fields', "$$ROOT"]}}
},
// this stage wouldn't be necessary if the field wasn't nested
{'$addFields': {'favorites.books': '$books'}},
{'$project': {'_other_fields': 0, 'books': 0}}
])
{ "_id" : ObjectId("4f6cd3c47156522f4f45b26f"), "name" : "robert", "favorites" :
{ "books" : [ "The Art of Computer Programmning", "Graph Theory", "Algorithms in C++" ] } }