mongodb aggregation framework group + project

2019-04-01 18:07发布

问题:

I have the following issue:

this query return 1 result which is what I want:

> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } } }])

{
"result" : [
    {
        "_id" : "b91e51e9-6317-4030-a9a6-e7f71d0f2161",
        "version" : 1.2000000000000002
    }
],
"ok" : 1
}

this query ( I just added projection so I can later query for the entire document) return multiple results. What am I doing wrong?

> db.items.aggregate([ {$group: { "_id": "$id", version: { $max: "$version" } }, $project: { _id : 1 } }])

  {
"result" : [
    {
        "_id" : ObjectId("5139310a3899d457ee000003")
    },
    {
        "_id" : ObjectId("513931053899d457ee000002")
    },
    {
        "_id" : ObjectId("513930fd3899d457ee000001")
    }
],
"ok" : 1
}

回答1:

found the answer

1. first I need to get all the _ids

db.items.aggregate( [ 
  { '$match': { 'owner.id': '9e748c81-0f71-4eda-a710-576314ef3fa' } },
  { '$group': { _id: '$item.id', dbid: { $max: "$_id" } } } 
]);

2. then i need to query the documents

db.items.find({ _id: { '$in': "IDs returned from aggregate" } });

which will look like this:

db.items.find({ _id: { '$in': [ '1', '2', '3' ] } });


回答2:

( I know its late but still answering it so that other people don't have to go search for the right answer somewhere else )

See to the answer of Deka, this will do your job.



回答3:

Not all accumulators are available in $project stage. We need to consider what we can do in project with respect to accumulators and what we can do in group. Let's take a look at this:


db.companies.aggregate([{
  $match: {
    funding_rounds: {
      $ne: []
    }
  }
}, {
  $unwind: "$funding_rounds"
}, {
  $sort: {
    "funding_rounds.funded_year": 1,
    "funding_rounds.funded_month": 1,
    "funding_rounds.funded_day": 1
  }
}, {
  $group: {
    _id: {
      company: "$name"
    },
    funding: {
      $push: {
        amount: "$funding_rounds.raised_amount",
        year: "$funding_rounds.funded_year"
      }
    }
  }
}, ]).pretty()

Where we're checking if any of the funding_rounds is not empty. Then it's unwind-ed to $sort and to later stages. We'll see one document for each element of the funding_rounds array for every company. So, the first thing we're going to do here is to $sort based on:

  1. funding_rounds.funded_year
  2. funding_rounds.funded_month
  3. funding_rounds.funded_day

In the group stage by company name, the array is getting built using $push. $push is supposed to be part of a document specified as the value for a field we name in a group stage. We can push on any valid expression. In this case, we're pushing on documents to this array and for every document that we push it's being added to the end of the array that we're accumulating. In this case, we're pushing on documents that are built from the raised_amount and funded_year. So, the $group stage is a stream of documents that have an _id where we're specifying the company name.

Notice that $push is available in $group stages but not in $project stage. This is because $group stages are designed to take a sequence of documents and accumulate values based on that stream of documents.

$project on the other hand, works with one document at a time. So, we can calculate an average on an array within an individual document inside a project stage. But doing something like this where one at a time, we're seeing documents and for every document, it passes through the group stage pushing on a new value, well that's something that the $project stage is just not designed to do. For that type of operation we want to use $group.

Let's take a look at another example:


db.companies.aggregate([{
  $match: {
    funding_rounds: {
      $exists: true,
      $ne: []
    }
  }
}, {
  $unwind: "$funding_rounds"
}, {
  $sort: {
    "funding_rounds.funded_year": 1,
    "funding_rounds.funded_month": 1,
    "funding_rounds.funded_day": 1
  }
}, {
  $group: {
    _id: {
      company: "$name"
    },
    first_round: {
      $first: "$funding_rounds"
    },
    last_round: {
      $last: "$funding_rounds"
    },
    num_rounds: {
      $sum: 1
    },
    total_raised: {
      $sum: "$funding_rounds.raised_amount"
    }
  }
}, {
  $project: {
    _id: 0,
    company: "$_id.company",
    first_round: {
      amount: "$first_round.raised_amount",
      article: "$first_round.source_url",
      year: "$first_round.funded_year"
    },
    last_round: {
      amount: "$last_round.raised_amount",
      article: "$last_round.source_url",
      year: "$last_round.funded_year"
    },
    num_rounds: 1,
    total_raised: 1,
  }
}, {
  $sort: {
    total_raised: -1
  }
}]).pretty()

In the $group stage, we're using $first and $last accumulators. Right, again we can see that as with $push - we can't use $first and $last in project stages. Because again, project stages are not designed to accumulate values based on multiple documents. Rather they're designed to reshape documents one at a time. Total number of rounds is calculated using the $sum operator. The value 1 simply counts the number of documents passed through that group together with each document that matches or is grouped under a given _id value. The project may seem complex, but it's just making the output pretty. It's just that it's including num_rounds and total_raised from the previous document.