How to use $regex inside $or as an Aggregation Exp

2020-07-22 04:50发布

问题:

I have a query which allows the user to filter by some string field using a format that looks like: "Where description of the latest inspection is any of: foo or bar". This works great with the following query:

db.getCollection('permits').find({
  '$expr': {
    '$let': {
      vars: {
        latestInspection: {
          '$arrayElemAt': ['$inspections', {
            '$indexOfArray': ['$inspections.inspectionDate', {
              '$max': '$inspections.inspectionDate'
            }]
          }]
        }
      },
      in: {
        '$in': ['$$latestInspection.description', ['Fire inspection on property', 'Health inspection']]
      }
    }
  }
})

What I want is for the user to be able to use wildcards which I turn into regular expressions: "Where description of the latest inspection is any of: Health inspection or Found a * at the property".

The regex I get, don't need help with that. The problem I'm facing is, apparently the aggregation $in operator does not support matching by regular expressions. So I thought I'd build this using $or since the docs don't say I can't use regex. This was my best attempt:

db.getCollection('permits').find({
  '$expr': {
    '$let': {
      vars: {
        latestInspection: {
          '$arrayElemAt': ['$inspections', {
            '$indexOfArray': ['$inspections.inspectionDate', {
              '$max': '$inspections.inspectionDate'
            }]
          }]
        }
      },
      in: {
        '$or': [{
          '$$latestInspection.description': {
            '$regex': /^Found a .* at the property$/
          }
        }, {
          '$$latestInspection.description': 'Health inspection'
        }]
      }
    }
  }
})

Except I'm getting the error:

"Unrecognized expression '$$latestInspection.description'"

I'm thinking I can't use $$latestInspection.description as an object key but I'm not sure (my knowledge here is limited) and I can't figure out another way to do what I want. So you see I wasn't even able to get far enough to see if I can use $regex in $or. I appreciate all the help I can get.

回答1:

Everything inside $expr is an aggregation expression, and the documentation may not "say you cannot explicitly", but the lack of any named operator and the JIRA issue SERVER-11947 certainly say that. So if you need a regular expression then you really have no other option than using $where instead:

db.getCollection('permits').find({
  "$where": function() {
    var description = this.inspections
       .sort((a,b) => b.inspectionDate.valueOf() - a.inspectionDate.valueOf())
       .shift().description;

     return /^Found a .* at the property$/.test(description) ||
           description === "Health Inspection";

  }
})

You can still use $expr and aggregation expressions for an exact match, or just keep the comparison within the $where anyway. But at this time the only regular expressions MongoDB understands is $regex within a "query" expression.

If you did actually "require" an aggregation pipeline expression that precludes you from using $where, then the only current valid approach is to first "project" the field separately from the array and then $match with the regular query expression:

db.getCollection('permits').aggregate([
  { "$addFields": {
     "lastDescription": {
       "$arrayElemAt": [
         "$inspections.description",
         { "$indexOfArray": [
           "$inspections.inspectionDate",
           { "$max": "$inspections.inspectionDate" }
         ]}
       ]
     }
  }},
  { "$match": {
    "lastDescription": {
      "$in": [/^Found a .* at the property$/,/Health Inspection/]
    }
  }}
])

Which leads us to the fact that you appear to be looking for the item in the array with the maximum date value. The JavaScript syntax should be making it clear that the correct approach here is instead to $sort the array on "update". In that way the "first" item in the array can be the "latest". And this is something you can do with a regular query.

To maintain the order, ensure new items are added to the array with $push and $sort like this:

db.getCollection('permits').updateOne(
  { "_id": _idOfDocument },
  {
    "$push": {
      "inspections": {
        "$each": [{ /* Detail of inspection object */ }],
        "$sort": { "inspectionDate": -1 }
      }
    }
  }
)

In fact with an empty array argument to $each an updateMany() will update all your existing documents:

db.getCollection('permits').updateMany(
  { },
  {
    "$push": {
      "inspections": {
        "$each": [],
        "$sort": { "inspectionDate": -1 }
      }
    }
  }
)

These really only should be necessary when you in fact "alter" the date stored during updates, and those updates are best issued with bulkWrite() to effectively do "both" the update and the "sort" of the array:

db.getCollection('permits').bulkWrite([
  { "updateOne": {
    "filter": { "_id": _idOfDocument, "inspections._id": indentifierForArrayElement },
    "update": {
      "$set": { "inspections.$.inspectionDate": new Date() }
    }
  }},
  { "updateOne": {
    "filter": { "_id": _idOfDocument },
    "update": {
      "$push": { "inspections": { "$each": [], "$sort": { "inspectionDate": -1 } } }
    }
  }}
])

However if you did not ever actually "alter" the date, then it probably makes more sense to simply use the $position modifier and "pre-pend" to the array instead of "appending", and avoiding any overhead of a $sort:

db.getCollection('permits').updateOne(
  { "_id": _idOfDocument },
  { 
    "$push": { 
      "inspections": {
        "$each": [{ /* Detail of inspection object */ }],
        "$position": 0
      }
    }
  }
)

With the array permanently sorted or at least constructed so the "latest" date is actually always the "first" entry, then you can simply use a regular query expression:

db.getCollection('permits').find({
  "inspections.0.description": { 
    "$in": [/^Found a .* at the property$/,/Health Inspection/]
  }
})

So the lesson here is don't try and force calculated expressions upon your logic where you really don't need to. There should be no compelling reason why you cannot order the array content as "stored" to have the "latest date first", and even if you thought you needed the array in any other order then you probably should weigh up which usage case is more important.

Once reodered you can even take advantage of an index to some extent as long as the regular expressions are either anchored to the beginning of string or at least something else in the query expression does an exact match.

In the event you feel you really cannot reorder the array, then the $where query is your only present option until the JIRA issue resolves. Which is hopefully actually for the 4.1 release as currently targeted, but that is more than likely 6 months to a year at best estimate.