I use mongo aggregate function to find duplicated documents in a collection, where the collections looks like the following:
{_id, placement_id, placement_name, program_id, target}
I need to find all the documents that have exactly the same fields except _id and placement_id, so this two documents are the same:
{_id:3, placement_id:23, placement_name:"pl1", program_id:5, target:"-"}
{_id:7, placement_id:55, placement_name:"pl1", program_id:5, target:"-"}
The aggregate function I came up with is:
db.placements.aggregate({$group:{_id:{placement_name:"$placement_name", program_id:"$program_id", target:"$target"}, total:{$sum:1}}},{$match:{total:{$gte:2}}});
Then mongo just returned:
Error: Printing Stack Trace
at printStackTrace (src/mongo/shell/utils.js:37:15)
at DBCollection.aggregate (src/mongo/shell/collection.js:897:9)
at (shell):1:15
Wed Apr 2 07:43:23.090 aggregate failed: {
"errmsg" : "exception: aggregation result exceeds maximum document size (16MB)",
"code" : 16389,
"ok" : 0
} at src/mongo/shell/collection.js:898
the aggregate is correct, I tested it on a smaller collection and it works fine, but the production collection has about 80M documents. I was wondering when trying the find() function on 80M documents, it works and asking you to type 'it' for more records. How come the aggregate function doesn't have this capability? I also tried to append limit() to the end of the aggregate function but it won't work either. Any work around? Thanks.