MongoDB aggregate selective project

2019-03-16 13:46发布

问题:

I am running into problems reshaping my document during aggregation for grouping. Basically I want to push entries to fields depending in their type. I have a structure as follows:

_id: P1
   entities: [{type: A, val: X}, {type: B, val: X}, {type: A, val: Y}]
   ...

I want to $unwind and $project these entities so that get a structure like:

_id: P1
   A: [X]
   B: []
_id: P1
   A: [Y]
   B: []
_id: P1
   A: []
   B: [X]

so i can perform a grouping by A or B or both, i.e.

$group: {
   _id: {
       A: $A, 
       B: $B
    }
    count: {$sum : 1}

I thought I could simply do:

$unwind: $entities
$project: {
   id: §id
   A: {"$cond":[{"$eq":["$type","A"]},"$code"]}
   B: {"$cond":[{"$eq":["$type","B"]},"$code"]}
}
$group: {
  _id: "$id"
  A: {$addToSet : "$A"}
}

or failing that something like

$unwind: $entities
$group: {
  _id: "$id"
  A: {"$cond":[{"$eq":["$type","A"]},$push: "$code", null]}
  ...
}

but both versions fail because I cannot do nothing on else and I did not manage to use $push inside a conditional. The closest I got is project depending on the type, but since I could not find a way not to add anything to the field when there was no match, I end up with:

_id: P1
   A: [X,null,Y]
   B: [null,X,null]

which messes up the counting. My second idea was to filter the arrays to remove null. But I did not find a way to remove entities, because again $cond wouldnt let me specify an empty/"do nothing" else case.

I have a feeling it could work with grouping by type and content with matching of the required types, but because I have many types and arbitrary groupings resulting in a grouping tree, this might become very complicated. Ideas or hints to mistakes would be very welcome.

Thank you

EDIT: The solution based on the accepted anwser

I had to slightly adapt it, to filter cases where all content of a type was null, because otherwise it would have been lost during matching and because I want to keep that knowledge. Thanks!

{$project:{
  A: {$cond: [
      {$eq: ["$A", [false]]},
      ["N/A"],
      "$A"
  ]},
  B: {$cond: [
      {$eq: ["$B", [false]]},
      ["N/A"],
      "$B"
  ]},
}},
{ "$unwind": "$A" },
{ "$match": { "A": { "$ne": false } } },
{ "$group": {
    "_id": "$_id",
    "A": { "$push": "$A" },
    "B": { "$first": "$B" }
}},
{ "$unwind": "$B" },
{ "$match": { "B": { "$ne": false } } },
{ "$group": {
    "_id": "$_id",
    "A": { "$first": "$A" },
    "B": { "$push": "$B" }
}}

回答1:

You seemed on the right track, there are just different approaches to removing those values of false from the conditional. You cannot have it return nothing, but you cn get rid of the values you do not want.

If you truly want "sets" and you have MongoDB 2.6 or greater available, then you basically filter out the false values using $setDifference:

db.entities.aggregate([
    { "$unwind": "$entities" },
    { "$group": {
        "_id": "$_id",
        "A": { 
            "$addToSet": {
                "$cond": [
                    { "$eq": [ "$entities.type", "A" ] },
                    "$entities.val",
                    false
                ]
            }
        },
        "B": { 
            "$addToSet": {
                "$cond": [
                    { "$eq": [ "$entities.type", "B" ] },
                    "$entities.val",
                    false
                ]
            }
        }
    }},
    { "$project": {
        "A": {
            "$setDifference": [ "$A", [false] ]
        },
        "B": {
            "$setDifference": [ "$B", [false] ]
        }
    }}
])

Or just as one step using the $map operator inside $project:

db.entities.aggregate([
    {"$project": {
        "A": {
             "$setDifference": [
                 {
                     "$map": {
                         "input": "$entities",
                         "as": "el",
                         "in": {
                             "$cond": [
                                 { "$eq": [ "$$el.type", "A" ] },
                                 "$$el.val",
                                 false
                             ]
                         }
                     }
                 },
                 [false]
             ]
         },
        "B": {
             "$setDifference": [
                 {
                     "$map": {
                         "input": "$entities",
                         "as": "el",
                         "in": {
                             "$cond": [
                                 { "$eq": [ "$$el.type", "B" ] },
                                 "$$el.val",
                                 false
                             ]
                         }
                     }
                 },
                 [false]
             ]
         }
    }}
])

Or otherwise stay with general $unwind and $match operators to filter these:

db.entities.aggregate([
    { "$unwind": "$entities" },
    { "$group": {
        "_id": "$_id",
        "A": { 
            "$push": {
                "$cond": [
                    { "$eq": [ "$entities.type", "A" ] },
                    "$entities.val",
                    false
                ]
            }
        },
        "B": { 
            "$push": {
                "$cond": [
                    { "$eq": [ "$entities.type", "B" ] },
                    "$entities.val",
                    false
                ]
            }
        }
    }},
    { "$unwind": "$A" },
    { "$match": { "A": { "$ne": false } } },
    { "$group": {
        "_id": "$_id",
        "A": { "$push": "$A" },
        "B": { "$first": "$B" }
    }},
    { "$unwind": "$B" },
    { "$match": { "B": { "$ne": false } } },
    { "$group": {
        "_id": "$_id",
        "A": { "$first": "$A" },
        "B": { "$push": "$B" }
    }}
])

Using either $push for normal arrays or $addToSet for unique sets.