Mongo Group and sum with two fields

2020-04-08 13:11发布

问题:

I have documents like:

{
   "from":"abc@sss.ddd",
   "to" :"ssd@dff.dff",
   "email": "Hi hello"
}

How can we calculate count of sum "from and to" or "to and from"? Like communication counts between two people?

I am able to calculate one way sum. I want to have sum both ways.

db.test.aggregate([
      { $group: {
         "_id":{ "from": "$from", "to":"$to"},
           "count":{$sum:1} 
         }
      },
      { 
        "$sort" :{"count":-1}
      }
])

回答1:

Since you need to calculate number of emails exchanged between 2 addresses, it would be fair to project a unified between field as following:

db.a.aggregate([
    { $match: {
        to: { $exists: true },
        from: { $exists: true },
        email: { $exists: true }
    }}, 
    { $project: {
        between: { $cond: { 
            if: { $lte: [ { $strcasecmp: [ "$to", "$from" ] }, 0 ] }, 
            then: [ { $toLower: "$to" }, { $toLower: "$from" } ], 
            else: [ { $toLower: "$from" }, { $toLower: "$to" } ] }
        } 
    }},
    { $group: {
         "_id": "$between",
         "count": { $sum: 1 } 
    }},
    { $sort :{ count: -1 } }
])

Unification logic should be quite clear from the example: it is an alphabetically sorted array of both emails. The $match and $toLower parts are optional if you trust your data.

Documentation for operators used in the example:

  • $match
  • $exists
  • $project
  • $cond
  • $lte
  • $strcasecmp
  • $toLower
  • $group
  • $sum
  • $sort


回答2:

You basically need to consider the _id for grouping as an "array" of the possible "to" and "from" values, and then of course "sort" them, so that in every document the combination is always in the same order.

Just as a side note, I want to add that "typically" when I am dealing with messaging systems like this, the "to" and "from" sender/recipients are usually both arrays to begin with anyway, so it usally forms the base of where different variations on this statement come from.

First, the most optimal MongoDB 3.2 statement, for single addresses

db.collection.aggregate([
    // Join in array
    { "$project": {
        "people": [ "$to", "$from" ],
    }},

    // Unwind array
    { "$unwind": "$people" },

    // Sort array
    { "$sort": { "_id": 1, "people": 1 } },

    // Group document
    { "$group": {
        "_id": "$_id",
        "people": { "$push": "$people" }
    }},

    // Group people and count
    { "$group": {
        "_id": "$people",
        "count": { "$sum": 1 }
    }}
]);

Thats the basics, and now the only variations are in construction of the "people" array ( stage 1 only above ).

MongoDB 3.x and 2.6.x - Arrays

{ "$project": {
    "people": { "$setUnion": [ "$to", "$from" ] }
}}

MongoDB 3.x and 2.6.x - Fields to array

{ "$project": {
    "people": { 
        "$map": {
            "input": ["A","B"],
            "as": "el",
            "in": {
               "$cond": [
                   { "$eq": [ "A", "$$el" ] },
                   "$to",
                   "$from"
               ]
            }
        }
    }
}}

MongoDB 2.4.x and 2.2.x - from fields

{ "$project": {
    "to": 1,
    "from": 1,
    "type": { "$const": [ "A", "B" ] }
}},
{ "$unwind": "$type" },
{ "$group": {
    "_id": "$_id",
    "people": {
        "$addToSet": {
            "$cond": [
                { "$eq": [ "$type", "A" ] },
                "$to",
                "$from"
            ]
        }
    }
}}

But in all cases:

  1. Get all recipients into a distinct array.

  2. Order the array to a consistent order

  3. Group on the "always in the same order" list of recipients.

Follow that and you cannot go wrong.