I want to calculate the rating_average field of this object with the rating fields inside the array ratings. Can you help me to understand how to use aggregation with $avg?
{
"title": "The Hobbit",
"rating_average": "???",
"ratings": [
{
"title": "best book ever",
"rating": 5
},
{
"title": "good book",
"rating": 3.5
}
]
}
The aggregation framework in MongoDB 3.4 and newer offers the $reduce
operator which efficiently calculates the total without the need for extra pipelines. Consider using it as an expression to return the
total ratings and get the number of ratings using $size
. Together with $addFields
, the average can thus be calculated using the arithmetic operator $divide
as in the formula average = total ratings/number of ratings
:
db.collection.aggregate([
{
"$addFields": {
"rating_average": {
"$divide": [
{ // expression returns total
"$reduce": {
"input": "$ratings",
"initialValue": 0,
"in": { "$add": ["$$value", "$$this.rating"] }
}
},
{ // expression returns ratings count
"$cond": [
{ "$ne": [ { "$size": "$ratings" }, 0 ] },
{ "$size": "$ratings" },
1
]
}
]
}
}
}
])
Sample Output
{
"_id" : ObjectId("58ab48556da32ab5198623f4"),
"title" : "The Hobbit",
"ratings" : [
{
"title" : "best book ever",
"rating" : 5.0
},
{
"title" : "good book",
"rating" : 3.5
}
],
"rating_average" : 4.25
}
With older versions, you would need to first apply the $unwind
operator on the ratings
array field first as your initial aggregation pipeline step. This will deconstruct the ratings
array field from the input documents to output a document for each element. Each output document replaces the array with an element value.
The second pipeline stage would be the $group
operator which groups input documents by the _id
and title
keys identifier expression and applies the desired $avg
accumulator expression to each group that calculates the average. There is another accumulator operator $push
that preserves the original ratings array field by returning an array of all values that result from applying an expression to each document in the above group.
The final pipeline step is the $project
operator which then reshapes each document in the stream, such as by adding the new field ratings_average
.
So, if for instance you have a sample document in your collection (as from above and so below):
db.collection.insert({
"title": "The Hobbit",
"ratings": [
{
"title": "best book ever",
"rating": 5
},
{
"title": "good book",
"rating": 3.5
}
]
})
To calculate the ratings array average and projecting the value in another field ratings_average
, you can then apply the following aggregation pipeline:
db.collection.aggregate([
{
"$unwind": "$ratings"
},
{
"$group": {
"_id": {
"_id": "$_id",
"title": "$title"
},
"ratings":{
"$push": "$ratings"
},
"ratings_average": {
"$avg": "$ratings.rating"
}
}
},
{
"$project": {
"_id": 0,
"title": "$_id.title",
"ratings_average": 1,
"ratings": 1
}
}
])
Result:
/* 1 */
{
"result" : [
{
"ratings" : [
{
"title" : "best book ever",
"rating" : 5
},
{
"title" : "good book",
"rating" : 3.5
}
],
"ratings_average" : 4.25,
"title" : "The Hobbit"
}
],
"ok" : 1
}
This really could be written so much shorter, and this was even true at the time of writing. If you want an "average" simply use $avg
:
db.collection.aggregate([
{ "$addFields": {
"rating_average": { "$avg": "$ratings.rating" }
}}
])
The reason for this is that as of MongoDB 3.2 the $avg
operator gained "two" things:
The ability to process an "array" of arguments in a "expression" form rather than solely as an accumulator to $group
Benefits from the features of MongoDB 3.2 that allowed the "shorthand" notation of array expressions. Being either in composition:
{ "array": [ "$fielda", "$fieldb" ] }
or in notating a single property from the array as an array of the values of that property:
{ "$avg": "$ratings.rating" } // equal to { "$avg": [ 5, 3.5 ] }
In earlier releases you would have to use $map
in order to access the "rating"
property inside each array element. Now you don't.
For the record, even the $reduce
usage can be simplified:
db.collection.aggregate([
{ "$addFields": {
"rating_average": {
"$reduce": {
"input": "$ratings",
"initialValue": 0,
"in": {
"$add": [
"$$value",
{ "$divide": [
"$$this.rating",
{ "$size": { "$ifNull": [ "$ratings", [] ] } }
]}
]
}
}
}
}}
])
Yes as stated, this is really just re-implementing the existing $avg
functionality, and therefore since that operator is available then it is the one that should be used.
As you have your to-be-calculated-average data in an array, first you need to unwind it. Do it by using the $unwind
in your aggregation pipeline:
{$unwind: "$ratings"}
Then you may access each element of the array as an embedded document with key ratings
in the result documents of the aggregation. Then you just need to $group
by title
and calculate $avg
:
{$group: {_id: "$title", ratings: {$push: "$ratings"}, average: {$avg: "$ratings.rating"}}}
Then just recover your title
field:
{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
So here is your result aggregation pipeline:
db.yourCollection.aggregate([
{$unwind: "$ratings"},
{$group: {_id: "$title",
ratings: {$push: "$ratings"},
average: {$avg: "$ratings.rating"}
}
},
{$project: {_id: 0, title: "$_id", ratings: 1, average: 1}}
])