I'm trying to use the Mongo aggregation framework to find where there are records that have different unique sets within the same document. An example will best explain this:
Here is a document that is not my real data, but conceptually the same:
db.house.insert(
{
houseId : 123,
rooms: [{ name : 'bedroom',
owns : [
{name : 'bed'},
{name : 'cabinet'}
]},
{ name : 'kitchen',
owns : [
{name : 'sink'},
{name : 'cabinet'}
]}],
uses : [{name : 'sink'},
{name : 'cabinet'},
{name : 'bed'},
{name : 'sofa'}]
}
)
Notice that there are two hierarchies with similar items. It is also possible to use items that are not owned. I want to find documents like this one: where there is a house that uses something that it doesn't own.
So far I've built up the structure using the aggregate framework like below. This gets me to 2 sets of distinct items. However I haven't been able to find anything that could give me the result of a set intersection. Note that a simple count of set size will not work due to something like this: ['couch', 'cabinet'] compare to ['sofa', 'cabinet'].
{'$unwind':'$uses'}
{'$unwind':'$rooms'}
{'$unwind':'$rooms.owns'}
{'$group' : {_id:'$houseId',
use:{'$addToSet':'$uses.name'},
own:{'$addToSet':'$rooms.owns.name'}}}
produces:
{ _id : 123,
use : ['sink', 'cabinet', 'bed', 'sofa'],
own : ['bed', 'cabinet', 'sink']
}
How do I then find the set intersection of use and own in the next stage of the pipeline?
You were not very far from the full solution with aggregation framework - you needed one more thing before the $group
step and that is something that would allow you to see if all the things that are being used match up with something that is owned.
Here is the full pipeline
> db.house.aggregate(
{'$unwind':'$uses'},
{'$unwind':'$rooms'},
{'$unwind':'$rooms.owns'},
{$project: { _id:0,
houseId:1,
uses:"$uses.name",
isOkay:{$cond:[{$eq:["$uses.name","$rooms.owns.name"]}, 1, 0]}
}
},
{$group: { _id:{house:"$houseId",item:"$uses"},
hasWhatHeUses:{$sum:"$isOkay"}
}
},
{$match:{hasWhatHeUses:0}})
and its output on your document
{
"result" : [
{
"_id" : {
"house" : 123,
"item" : "sofa"
},
"hasWhatHeUses" : 0
}
],
"ok" : 1
}
Explanation - once you unwrap both arrays you now want to flag the elements where used item is equal to owned item and give them a non-0 "score". Now when you regroup things back by houseId you can check if any used items didn't get a match. Using 1 and 0 for score allows you to do a sum and now a match for item which has sum 0 means it was used but didn't match anything in "owned". Hope you enjoyed this!
So here is a solution not using the aggregation framework. This uses the $where operator and javascript. This feels much more clunky to me, but it seems to work so I wanted to put it out there if anyone else comes across this question.
db.houses.find({'$where':
function() {
var ownSet = {};
var useSet = {};
for (var i=0;i<obj.uses.length;i++){
useSet[obj.uses[i].name] = true;
}
for (var i=0;i<obj.rooms.length;i++){
var room = obj.rooms[i];
for (var j=0;j<room.owns.length;j++){
ownSet[room.owns[j].name] = true;
}
}
for (var prop in ownSet) {
if (ownSet.hasOwnProperty(prop)) {
if (!useSet[prop]){
return true;
}
}
}
for (var prop in useSet) {
if (useSet.hasOwnProperty(prop)) {
if (!ownSet[prop]){
return true;
}
}
}
return false
}
})
For MongoDB 2.6+ Only
As of MongoDB 2.6, there are set operations available in the project pipeline stage. The way to answer this problem with the new operations is:
db.house.aggregate([
{'$unwind':'$uses'},
{'$unwind':'$rooms'},
{'$unwind':'$rooms.owns'},
{'$group' : {_id:'$houseId',
use:{'$addToSet':'$uses.name'},
own:{'$addToSet':'$rooms.owns.name'}}},
{'$project': {int:{$setIntersection:["$use","$own"]}}}
]);