Group count with MongoDB using aggregation framewo

2019-01-12 17:39发布

问题:

Let's say my MongoDB schema looks like this:

{car_id: "...", owner_id: "..."}

This is a many-to-many relationship. For example, the data might look like this:

+-----+----------+--------+
| _id | owner_id | car_id |
+-----+----------+--------+
|   1 |        1 |      1 |
|   2 |        1 |      2 |
|   3 |        1 |      3 |
|   4 |        2 |      1 |
|   5 |        2 |      2 |
|   6 |        3 |      4 |
|   7 |        3 |      5 |
|   8 |        3 |      6 |
|   9 |        3 |      7 |
|  10 |        1 |      1 | <-- not unique
+-----+----------+--------+

I want to get the number of cars owned by each owner. In SQL, this might look like:

SELECT owner_id, COUNT(*) AS cars_owned
FROM (SELECT owner_id FROM car_owners GROUP BY owner_id, car_id) AS t
GROUP BY owner_id;

In this case, the result would look like this:

+----------+------------+
| owner_id | cars_owned |
+----------+------------+
|        1 |          3 |
|        2 |          2 |
|        3 |          4 |
+----------+------------+

How can I accomplish this same thing using MongoDB using the aggregation framework?

回答1:

To accommodate the potential duplicates, you need to use two $group operations:

db.test.aggregate([
    { $group: {
        _id: { owner_id: '$owner_id', car_id: '$car_id' }
    }},
    { $group: {
        _id: '$_id.owner_id',
        cars_owned: { $sum: 1 }
    }},
    { $project: {
        _id: 0,
        owner_id: '$_id',
        cars_owned: 1
    }}]
    , function(err, result){
        console.log(result);
    }
);

Gives a result with a format of:

[ { cars_owned: 2, owner_id: 10 },
  { cars_owned: 1, owner_id: 11 } ]


回答2:

$group is similar to SQL Group by command. In the below example, we're going to aggregate companies on the basis of the year in which they were founded. And calculate the average number of employees for each company.


db.companies.aggregate([{
    $group: {
      _id: {
        founded_year: "$founded_year"
      },
      average_number_of_employees: {
        $avg: "$number_of_employees"
      }
    }
  }, {
    $sort: {
      average_number_of_employees: -1
    }
  }
])

This aggregation pipeline has 2 stages

  1. $group
  2. $sort

Now, fundamental to the $group stage is the _id field that we specify as the part of the document. That is the value of the $group operator itself using a very strict interpretation of the arrogation framework syntax. _id is how we define, how we control, how we tune what the group stage uses to organize the documents that it sees.

The below query find the relationships of the people with companies using $sum operator:


db.companies.aggregate([{
  $match: {
    "relationships.person": {
      $ne: null
    }
  }
}, {
  $project: {
    relationships: 1,
    _id: 0
  }
}, {
  $unwind: "$relationships"
}, {
  $group: {
    _id: "$relationships.person",
    count: {
      $sum: 1
    }
  }
}, {
  $sort: {
    count: -1
  }
}])