MongoDB aggregate count of items in two arrays acr

2019-07-12 20:33发布

问题:

Here is my MongoDB collection schema:

company: String
model: String
cons: [String] // array of tags that were marked as "cons"
pros: [String] // array of tags that were marked as "pros"

I need to aggregate it so I get the following output:

[{
  "_id": {
    "company": "Lenovo",
    "model": "T400"
  },
  "tags": {
    tag: "SomeTag"
    pros: 124 // number of times, "SomeTag" tag was found in "pros" array in `Lenovo T400`
    cons: 345 // number of times, "SomeTag" tag was found in "cons" array in `Lenovo T400`
  }
}...]

I tried to do the following:

var aggParams = {};
aggParams.push({ $unwind: '$cons' });
aggParams.push({ $unwind: '$pros' });
aggParams.push({$group: {
  _id: {
    company: '$company',
    model: '$model',
    consTag: '$cons'
  },
  consTagCount: { $sum: 1 }
}});
aggParams.push({$group: {
  _id: {
    company: '$_id.company',
    model: '$_id.model',
    prosTag: '$pros'
  },
  prosTagCount: { $sum: 1 }
}});
aggParams.push({$group: {
  _id: {
    company:'$_id.company',
    model: '$_id.model'
  },
  tags: { $push: { tag: { $or: ['$_id.consTag', '$_id.prosTag'] }, cons: '$consTagCount', pros: '$prosTagCount'} }
}});

But I got the following result:

{
  "_id": {
    "company": "Lenovo",
    "model": "T400"
  },
  "tags": [
    {
      "tag": false,
      "pros": 7
    }
  ]
}

What is the right way to do this with aggregation?

回答1:

Yes this is a bit harder considering that there are multiple arrays, and if you try both at the same time you end up with a "cartesian condition" where one arrray multiplies the contents of the other.

Therefore, just combine the array content at the beginning, which probably indicates how you should be storing the data in the first place:

Model.aggregate(
    [
        { "$project": {
            "company": 1,
            "model": 1,
            "data": {
                "$setUnion": [
                    { "$map": {
                        "input": "$pros",
                        "as": "pro",
                        "in": {
                            "type": { "$literal": "pro" },
                            "value": "$$pro"
                        }
                    }},
                    { "$map": {
                        "input": "$cons",
                        "as": "con",
                        "in": {
                            "type": { "$literal": "con" },
                            "value": "$$con"
                        }
                    }}
                ]
            }
        }},
        { "$unwind": "$data" }
        { "$group": {
            "_id": { 
                "company": "$company",
                "model": "$model",
                "tag": "$data.value"
            },
            "pros": { 
                "$sum": { 
                    "$cond": [
                        { "$eq": [ "$data.type", "pro" ] },
                        1,
                        0
                    ]
                }
            },
            "cons": { 
                "$sum": { 
                    "$cond": [
                        { "$eq": [ "$data.type", "con" ] },
                        1,
                        0
                    ]
                }
            }
        }
    ], 
    function(err,result) {

    }
)

So via the first $project stage the $map operators are adding the "type" value to each item of each array. Not that it really matters here as all items should process "unique" anyway, the $setUnion operator "contatenates" each array into a singular array.

As mentioned earlier, you probably should be storing in this way in the first place.

Then process $unwind followed by $group, wherein each "pros" and "cons" is then evaluated via $cond to for it's matching "type", either returning 1 or 0 where the match is respectively true/false to the $sum aggregation accumulator.

This gives you a "logical match" to count each respective "type" within the aggregation operation as per the grouping keys specified.