I have an array A in memory created at runtime and another array B saved in a mongo database. How can I efficiently get all the elements from A that are not in B?
You can assume that the array stored in mongodb is several orders of magnitude bigger than the array created at runtime, for that reason I think that obtaining the full array from mongo and computing the result would not be efficient, but I have not found any query operation in mongo that allows me to compute the result I want.
Note that the $nin operator does the opposite of what I want, i.e., it retrieves the elements from B that are not in A.
Example:
Array A, created in my appliction at runtime, is [2, 3, 4]
.
Array B, stored in mongodb, is [1, 3, 5, 6, 7, 10]
.
The result I expect is [2, 4]
.
The only things that "modify" the document in response are .aggregate()
and .mapReduce()
, where the former is the better option.
In that case you are asking for $setDifference
which compares the "sets" and returns the "difference" between the two.
So representing a document with your array:
db.collection.insert({ "b": [1, 3, 5, 6, 7, 10] })
Run the aggregation:
db.collection.aggregate([{ "$project": { "c": { "$setDifference": [ [2,3,4], "$b" ] } } }])
Which returns:
{ "_id" : ObjectId("596005eace45be96e2cb221b"), "c" : [ 2, 4 ] }
If you do not want "sets" and instead want to supply an array like [2,3,4,4]
then you can compare with $filter
and $in
instead, if you have MongoDB 3.4 at least:
db.collection.aggregate([
{ "$project": {
"c": {
"$filter": {
"input": [2,3,4,4],
"as": "a",
"cond": {
"$not": { "$in": [ "$$a", "$b" ] }
}
}
}
}}
])
Or with $filter
and $anyElementTrue
in earlier versions:
db.collection.aggregate([
{ "$project": {
"c": {
"$filter": {
"input": [2,3,4,4],
"as": "a",
"cond": {
"$not": {
"$anyElementTrue": {
"$map": {
"input": "$b",
"as": "b",
"in": {
"$eq": [ "$$a", "$$b" ]
}
}
}
}
}
}
}
}}
])
Where both would return:
{ "_id" : ObjectId("596005eace45be96e2cb221b"), "c" : [ 2, 4, 4 ] }
Which is of course "not a set" since the 4
was provided as input "twice" and is therefore returned "twice" as well.