MongoDB - objects? Why do I need _id in aggregate

Here is an example from MongoDB tutorial (here it collection ZIP Code db:

db.zipcodes.aggregate( [
   { $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
   { $match: { totalPop: { $gte: 10*1000*1000 } } }
] )

if I replace _id with something else like word Test, I will get error message:

"errmsg" : "exception: the group aggregate field 'Test' must be defined as an expression inside an object",
"code" : 15951,
"ok" : 0

Could anybody help me understand why I need _id in my command? I thought MongoDB assigns IDs automatically, if used does not provide it.

标签： mongodb aggregation-framework

3条回答

我只想做你的唯一

2楼-- · 2020-02-28 05:36

The _id field is mandatory, but you can set it to null if you do not wish to aggregate with respect to a key, or keys. Not utilising it would result in a single aggregate value over the fields. It is thus acting a 'reserved word' in this context, indicating what the resulting 'identifier'/key is for each group.

In your case, grouping by _id: "$state" would result in n aggregate results of totalPop, provided there there are n distinct values for state (akin to SELECT SUM() FROM table GROUP BY state). Whereas,

$group : {_id : null, totalPop: { $sum: "$pop" }}}

would provide a single result for totalPop (akin to SELECT SUM() FROM table).

This behaviour is well described in the group operator documentation.

0人赞添加讨论(0) 举报

Melony?

3楼-- · 2020-02-28 05:41

We're going to understand the _id field within the $group stage & look at some best practices for constructing _ids in group aggregation stages. Let's look at this query:


db.companies.aggregate([{
  $match: {
    founded_year: {
      $gte: 2010
    }
  }
}, {
  $group: {
    _id: {
      founded_year: "$founded_year"
    },
    companies: {
      $push: "$name"
    }
  }
}, {
  $sort: {
    "_id.founded_year": 1
  }
}]).pretty()

MongoDB $group with document approach

One thing which might not be clear to us is why the _id field is constructed this "document" way? We could have done it this way as well:


db.companies.aggregate([{
  $match: {
    founded_year: {
      $gte: 2010
    }
  }
}, {
  $group: {
    _id: "$founded_year",
    companies: {
      $push: "$name"
    }
  }
}, {
  $sort: {
    "_id": 1
  }
}]).pretty()

MongoDB $group without document approach

We don't do it this way, because in these output documents - it's not explicit what exactly this number means. So, we actually don't know. And in some cases, that means there maybe confusion in interpreting these documents. So, another case maybe to group an _id document with multiple fields:


db.companies.aggregate([{
  $match: {
    founded_year: {
      $gte: 2010
    }
  }
}, {
  $group: {
    _id: {
      founded_year: "$founded_year",
      category_code: "$category_code"
    },
    companies: {
      $push: "$name"
    }
  }
}, {
  $sort: {
    "_id.founded_year": 1
  }
}]).pretty()

group an _id document with multiple fields in MongoDB

$push simply pushes the elements to generating arrays. Often, it might be required to group on promoted fields to upper level:


db.companies.aggregate([{
  $group: {
    _id: {
      ipo_year: "$ipo.pub_year"
    },
    companies: {
      $push: "$name"
    }
  }
}, {
  $sort: {
    "_id.ipo_year": 1
  }
}]).pretty()

group on promoted fields to upper level in MongoDB

It's also perfect to have an expression that resolves to a document as a _id key.

db.companies.aggregate([{
  $match: {
    "relationships.person": {
      $ne: null
    }
  }
}, {
  $project: {
    relationships: 1,
    _id: 0
  }
}, {
  $unwind: "$relationships"
}, {
  $group: {
    _id: "$relationships.person",
    count: {
      $sum: 1
    }
  }
}, {
  $sort: {
    count: -1
  }
}])

It's also perfect to have an expression that resolves to a document as a _id key in MongoDB

0人赞添加讨论(0) 举报

Anthone

4楼-- · 2020-02-28 05:48

In a $group stage, _id is used to designate the group condition. You obviously need it.

If you're familiar with the SQL world, think of it as the GROUP BY clause.

_{Please note, in that context too, _id is really an unique identifier in the generated collection, as by definition $group cannot produce two documents having the same value for that field.}

0人赞添加讨论(0) 举报

MongoDB - objects? Why do I need _id in aggregate

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间