Here is an example from MongoDB tutorial (here it collection ZIP Code db:
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
] )
if I replace _id
with something else like word Test
, I will get error message:
"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0
Could anybody help me understand why I need _id
in my command? I thought MongoDB assigns IDs automatically, if used does not provide it.
The
_id
field is mandatory, but you can set it tonull
if you do not wish to aggregate with respect to a key, or keys. Not utilising it would result in a single aggregate value over the fields. It is thus acting a 'reserved word' in this context, indicating what the resulting 'identifier'/key is for each group.In your case, grouping by
_id: "$state"
would result inn
aggregate results oftotalPop
, provided there there aren
distinct values forstate
(akin toSELECT SUM() FROM table GROUP BY state
). Whereas,would provide a single result for
totalPop
(akin toSELECT SUM() FROM table
).This behaviour is well described in the group operator documentation.
We're going to understand the
_id
field within the$group
stage & look at some best practices for constructing_id
s in group aggregation stages. Let's look at this query:One thing which might not be clear to us is why the
_id
field is constructed this "document" way? We could have done it this way as well:We don't do it this way, because in these output documents - it's not explicit what exactly this number means. So, we actually don't know. And in some cases, that means there maybe confusion in interpreting these documents. So, another case maybe to group an
_id
document with multiple fields:$push
simply pushes the elements to generating arrays. Often, it might be required to group on promoted fields to upper level:It's also perfect to have an expression that resolves to a document as a
_id
key.In a
$group
stage,_id
is used to designate the group condition. You obviously need it.If you're familiar with the SQL world, think of it as the
GROUP BY
clause.Please note, in that context too,
_id
is really an unique identifier in the generated collection, as by definition$group
cannot produce two documents having the same value for that field.