This question comes out of (as mine usually do) perusing the questions asked on SO and as such, raising another question for myself. So apart from the learning exercise in working towards a solution for a problem, I find that another question pops up, such as this.
The original question as yet remains unaccepted by the OP, and indeed has not been clarified as to what "they" wanted to achieve. But I did give my interpretation, in both the simple and long forms of arriving at a solution.
The process, in the end, has left me wondering that considering the long form of the solution, would there be some new feature to be introduced in the next (Currently expecting 2.6) MongoDB release, using the additional aggregation operators that have been introduced.
So the case is as follows:
Sample Documents
{
"tracked_item_type" : "Software",
"tracked_item_name" : "Word",
"duration" : 9540
}
{
"tracked_item_type" : "Software",
"tracked_item_name" : "Excel",
"duration" : 4000
}
{
"tracked_item_type" : "Software",
"tracked_item_name" : "Notepad",
"duration" : 4000
}
{
"tracked_item_type" : "Site",
"tracked_item_name" : "Facebook",
"duration" : 7920
}
{
"tracked_item_type" : "Site",
"tracked_item_name" : "Twitter",
"duration" : 5555
}
{
"tracked_item_type" : "Site",
"tracked_item_name" : "Digital Blasphemy",
"duration" : 8000
}
Desired Result
The top two results by each type, ordered by the total duration. Even though this is a small sample, duration is considered to be a $sum of many items.
{
"tracked_item_type": "Site",
"tracked_item_name": "Digital Blasphemy",
"duration" : 8000
}
{
"tracked_item_type": "Site",
"tracked_item_name": "Facebook",
"duration" : 7920
}
{
"tracked_item_type": "Software",
"tracked_item_name": "Word",
"duration" : 9540
}
{
"tracked_item_type": "Software",
"tracked_item_name": "Notepad",
"duration" : 4000
}
Aggregate Solution
This was my lengthy way to solve the problem
db.collection.aggregate([
// Group on the types and "sum" of duration
{"$group": {
"_id": {
"tracked_item_type": "$tracked_item_type",
"tracked_item_name": "$tracked_item_name"
},
"duration": {"$sum": "$duration"}
}},
// Sort by type and duration descending
{"$sort": { "_id.tracked_item_type": 1, "duration": -1 }},
/* The fun part */
// Re-shape results to "sites" and "software" arrays
{"$group": {
"_id": null,
"sites": {"$push":
{"$cond": [
{"$eq": ["$_id.tracked_item_type", "Site" ]},
{ "_id": "$_id", "duration": "$duration" },
null
]}
},
"software": {"$push":
{"$cond": [
{"$eq": ["$_id.tracked_item_type", "Software" ]},
{ "_id": "$_id", "duration": "$duration" },
null
]}
}
}},
// Remove the null values for "software"
{"$unwind": "$software"},
{"$match": { "software": {"$ne": null} }},
{"$group": {
"_id": "$_id",
"software": {"$push": "$software"},
"sites": {"$first": "$sites"}
}},
// Remove the null values for "sites"
{"$unwind": "$sites"},
{"$match": { "sites": {"$ne": null} }},
{"$group": {
"_id": "$_id",
"software": {"$first": "$software"},
"sites": {"$push": "$sites"}
}},
// Project out software and limit to the *top* 2 results
{"$unwind": "$software"},
{"$project": {
"_id": 0,
"_id": { "_id": "$software._id", "duration": "$software.duration" },
"sites": "$sites"
}},
{"$limit" : 2},
// Project sites, grouping multiple software per key, requires a sort
// then limit the *top* 2 results
{"$unwind": "$sites"},
{"$group": {
"_id": { "_id": "$sites._id", "duration": "$sites.duration" },
"software": {"$push": "$_id" }
}},
{"$sort": { "_id.duration": -1 }},
{"$limit": 2}
])
The "Not quite there yet" Output
And the point where aggregation falls short of getting to the final result. At least to my current understanding.
{
"result" : [
{
"_id" : {
"_id" : {
"tracked_item_type" : "Site",
"tracked_item_name" : "Digital Blasphemy"
},
"duration" : 8000
},
"software" : [
{
"_id" : {
"tracked_item_type" : "Software",
"tracked_item_name" : "Word"
},
"duration" : 9540
},
{
"_id" : {
"tracked_item_type" : "Software",
"tracked_item_name" : "Notepad"
},
"duration" : 4000
}
]
},
{
"_id" : {
"_id" : {
"tracked_item_type" : "Site",
"tracked_item_name" : "Facebook"
},
"duration" : 7920
},
"software" : [
{
"_id" : {
"tracked_item_type" : "Software",
"tracked_item_name" : "Word"
},
"duration" : 9540
},
{
"_id" : {
"tracked_item_type" : "Software",
"tracked_item_name" : "Notepad"
},
"duration" : 4000
}
]
}
],
"ok" : 1
}
This all seemed very reasonable (to Me anyway) that the result, while not complete could be post-processed in code in order to massage it into the desired form.
But indeed, it seems an exercise, and a point of intrigue as to whether this could be achieved with the use of any upcoming features for aggregation (or possibly another technique that has eluded me) to get to the desired result form.
So feel free to answer with any, suggestions / pointers as to how this could be achieved.