Compare arrays and Return the Difference

2019-02-20 04:07发布

问题:

I have an array A in memory created at runtime and another array B saved in a mongo database. How can I efficiently get all the elements from A that are not in B?

You can assume that the array stored in mongodb is several orders of magnitude bigger than the array created at runtime, for that reason I think that obtaining the full array from mongo and computing the result would not be efficient, but I have not found any query operation in mongo that allows me to compute the result I want.

Note that the $nin operator does the opposite of what I want, i.e., it retrieves the elements from B that are not in A.

Example:

Array A, created in my appliction at runtime, is [2, 3, 4].

Array B, stored in mongodb, is [1, 3, 5, 6, 7, 10].

The result I expect is [2, 4].

回答1:

The only things that "modify" the document in response are .aggregate() and .mapReduce(), where the former is the better option.

In that case you are asking for $setDifference which compares the "sets" and returns the "difference" between the two.

So representing a document with your array:

db.collection.insert({ "b": [1, 3, 5, 6, 7, 10] })

Run the aggregation:

db.collection.aggregate([{ "$project": { "c": { "$setDifference": [ [2,3,4], "$b" ] } } }])

Which returns:

{ "_id" : ObjectId("596005eace45be96e2cb221b"), "c" : [ 2, 4 ] }

If you do not want "sets" and instead want to supply an array like [2,3,4,4] then you can compare with $filter and $in instead, if you have MongoDB 3.4 at least:

db.collection.aggregate([
  { "$project": {
    "c": {
      "$filter": {
        "input": [2,3,4,4],
        "as": "a",
        "cond": {
          "$not": { "$in": [ "$$a", "$b" ]  }
        }
      }
    }   
  }}
])

Or with $filter and $anyElementTrue in earlier versions:

db.collection.aggregate([
  { "$project": {
    "c": {
      "$filter": {
        "input": [2,3,4,4],
        "as": "a",
        "cond": {
          "$not": {
            "$anyElementTrue": {
              "$map": {
                "input": "$b",
                "as": "b",
                "in": {
                  "$eq": [ "$$a", "$$b" ]    
                }
              }    
            }
          }
        }    
      }
    }    
  }}
])

Where both would return:

{ "_id" : ObjectId("596005eace45be96e2cb221b"), "c" : [ 2, 4, 4 ] }

Which is of course "not a set" since the 4 was provided as input "twice" and is therefore returned "twice" as well.