Let's say my MongoDB schema looks like this:
{car_id: "...", owner_id: "..."}
This is a many-to-many relationship. For example, the data might look like this:
+-----+----------+--------+
| _id | owner_id | car_id |
+-----+----------+--------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 2 | 1 |
| 5 | 2 | 2 |
| 6 | 3 | 4 |
| 7 | 3 | 5 |
| 8 | 3 | 6 |
| 9 | 3 | 7 |
| 10 | 1 | 1 | <-- not unique
+-----+----------+--------+
I want to get the number of cars owned by each owner. In SQL, this might look like:
SELECT owner_id, COUNT(*) AS cars_owned
FROM (SELECT owner_id FROM car_owners GROUP BY owner_id, car_id) AS t
GROUP BY owner_id;
In this case, the result would look like this:
+----------+------------+
| owner_id | cars_owned |
+----------+------------+
| 1 | 3 |
| 2 | 2 |
| 3 | 4 |
+----------+------------+
How can I accomplish this same thing using MongoDB using the aggregation framework?
To accommodate the potential duplicates, you need to use two $group
operations:
db.test.aggregate([
{ $group: {
_id: { owner_id: '$owner_id', car_id: '$car_id' }
}},
{ $group: {
_id: '$_id.owner_id',
cars_owned: { $sum: 1 }
}},
{ $project: {
_id: 0,
owner_id: '$_id',
cars_owned: 1
}}]
, function(err, result){
console.log(result);
}
);
Gives a result with a format of:
[ { cars_owned: 2, owner_id: 10 },
{ cars_owned: 1, owner_id: 11 } ]
$group
is similar to SQL Group by command. In the below example, we're going to aggregate companies on the basis of the year in which they were founded. And calculate the average number of employees for each company.
db.companies.aggregate([{
$group: {
_id: {
founded_year: "$founded_year"
},
average_number_of_employees: {
$avg: "$number_of_employees"
}
}
}, {
$sort: {
average_number_of_employees: -1
}
}
])
This aggregation pipeline has 2 stages
$group
$sort
Now, fundamental to the $group
stage is the _id
field that we specify as the part of the document. That is the value of the $group
operator itself using a very strict interpretation of the arrogation framework syntax. _id
is how we define, how we control, how we tune what the group stage uses to organize the documents that it sees.
The below query find the relationships of the people with companies using $sum
operator:
db.companies.aggregate([{
$match: {
"relationships.person": {
$ne: null
}
}
}, {
$project: {
relationships: 1,
_id: 0
}
}, {
$unwind: "$relationships"
}, {
$group: {
_id: "$relationships.person",
count: {
$sum: 1
}
}
}, {
$sort: {
count: -1
}
}])