Mongodb Query based on number of fields in a recor

2019-01-29 01:40发布

I haven't been very good at Googling for this answer.

I have around 115 different fields that might be in each record. Collection is the output of a mapreduce on an amazingly large dataset.

Looks like this:

{_id:'number1', value:{'a':1, 'b':2, 'f':5}},
{_id:'number2', value:{'e':2, 'f':114, 'h':12}},
{_id:'number3', value:{'i':2, 'j':22, 'z':12, 'za':111, 'zb':114}}

Any ideas of how I might find records with 5 fields populated?

1条回答
Viruses.
2楼-- · 2019-01-29 01:56

It's still not a nice query to run, but there is a slightly more modern way to do it via $objectToArray and $redact

db.collection.aggregate([
  { "$redact": {
    "$cond": {
      "if": {
        "$eq": [
          { "$size": { "$objectToArray": "$value" } },
          3
        ]
      },
      "then": "$$KEEP",
      "else": "$$PRUNE"
    }
  }}
])

Where $objectToArray basically coerces the object into an array form, much like a combination of Object.keys() and .map() would in JavaScript.

It's still not a fantastic idea since it does require scanning the whole collection, but at least the aggregation framework operations use "native code" as opposed to JavaScript interpretation as is the case using $where.

So it's still generally advisable to change data structure and use a natural array as well as stored "size" properties where possible in order to make the most effective query operations.


Yes it is possible to do but not in the nicest way. The reason for this is that you are essentially using a $where operator query which uses JavaScript evaluation to match the contents. Not the most efficient way as this can never use an index and needs to test all the documents:

db.collection.find({ "$where": "return Object.keys(this.value).length == 3" })

This looks for the condition matching "three" elements, then only two of your listed documents would be returned:

{ "_id" : "number1", "value" : { "a" : 1, "b" : 2, "f" : 5 } }
{ "_id" : "number2", "value" : { "e" : 2, "f" : 114, "h" : 12 } }

Or for "five" fields or more you can do much the same:

db.numbers.find({ "$where": "return Object.keys(this.value).length >= 5" })

So the arguments to that operator are effectively JavaScript statements that are evaluated on the server to return where true.

A more efficient way is to store the "count" of the elements in the document itself. In this way you can "index" this field and the queries are much more efficient as each document in the collection selected by other conditions does not need to be scanned to determine the length:

{_id:'number1', value:{'a':1, 'b':2, 'f':5} count: 3},
{_id:'number2', value:{'e':2, 'f':114, 'h':12}, count: 3},
{_id:'number3', value:{'i':2, 'j':22, 'z':12, 'za':111, 'zb':114}, count: 5}

Then to get the documents with "five" elements you only need the simple query:

db.collection.find({ "count": 5 })

That is generally the most optimal form. But another point is that the general "Object" structure that you might be happy with from general practice is not something that MongoDB "plays well" with in general. The problem is "traversal" of elements in the object, and in this way MongoDB is much happier when you use an "array". And even in this form:

{
    '_id': 'number1', 
    'values':[
        { 'key': 'a', 'value': 1 },
        { 'key': 'b', 'value': 2 }, 
        { 'key': 'f', 'value': 5 }
    ],
},
{
    '_id': 'number2', 
    'values':[
        { 'key': 'e', 'value': 2 }, 
        { 'key': 'f', 'value': 114 }, 
        { 'key': 'h', 'value': 12 }
    ],
},
{
    '_id':'number3', 
    'values': [
        { 'key': 'i', 'values': 2 }, 
        { 'key': 'j', 'values': 22 }, 
        { 'key': 'z'' 'values': :12 }, 
        { 'key': 'za', 'values': 111 },
        { 'key': 'zb', 'values': 114 }
    ]
}

So if you actually switch to an "array" format like that then you can do an exact length of an array with one version of the $size operator:

db.collection.find({ "values": { "$size": 5 } })

That operator can work for an exact value for an array length as that is a basic provision of what can be done with this operator. What you cannot do as is documented in a "in-equality" match. For that you need the "aggregation framework" for MongoDB, which is a better alternate to JavaScript and mapReduce operations:

db.collection.aggregate([
    // Project a size of the array
    { "$project": {
        "values": 1,
        "size": { "$size": "$values" }
    }},
    // Match on that size
    { "$match": { "size": { "$gte": 5 } } },
    // Project just the same fields 
    {{ "$project": {
        "values": 1
    }}
])

So those are the alternates. There is a "native" method available to aggregation and an array type. But it is fairly arguable that the JavaScript evaluation is also "native" to MongoDB, just not therefore implemented in native code.

查看更多
登录 后发表回答