How can I aggregate my MongoDB results by ObjectId date. Example:
Default cursor results:
cursor = [
{'_id': ObjectId('5220b974a61ad0000746c0d0'),'content': 'Foo'},
{'_id': ObjectId('521f541d4ce02a000752763a'),'content': 'Bar'},
{'_id': ObjectId('521ef350d24a9b00077090a5'),'content': 'Baz'},
]
Projected results:
projected_cursor = [
{'2013-09-08':
{'_id': ObjectId('5220b974a61ad0000746c0d0'),'content': 'Foo'},
{'_id': ObjectId('521f541d4ce02a000752763a'),'content': 'Bar'}
},
{'2013-09-07':
{'_id': ObjectId('521ef350d24a9b00077090a5'),'content': 'Baz'}
}
]
This is what I'm currently using in PyMongo to achieve these results, but it's messy and I'd like to see how I can do it using MongoDB's aggregation framework (or even MapReduce):
cursor = db.find({}, limit=10).sort("_id", pymongo.DESCENDING)
messages = [x for x in cursor]
this_date = lambda x: x['_id'].generation_time.date()
dates = set([this_date(message) for message in messages])
dates_dict = {date: [m for m in messages if this_date(m) == date] for date in dates}
And yes, I know that the easiest way would be to simply add a new date field to each record then aggregate by that, but that's not what I want to do right now.
Thanks!
There is no way to accomplish what you're asking with mongodb's
aggregation framework, because there is no aggregation operator that
can turn ObjectId's into something date-like (there is a JIRA
ticket, though). You
should be able to accomplish what you want using map-reduce, however:
// map function
function domap() {
// turn ObjectId --> ISODate
var date = this._id.getTimestamp();
// format the date however you want
var year = date.getFullYear();
var month = date.getMonth();
var day = date.getDate();
// yields date string as key, entire document as value
emit(year+"-"+month+"-"+day, this);
}
// reduce function
function doreduce(datestring, docs) {
return {"date":datestring, "docs":docs};
}
So this doesn't answer my question directly, but I did find a better way to replace all that lambda nonsense above using Python's setdefault
:
d = {}
for message in messages:
key = message['_id'].generation_time.date()
d.setdefault(key,[]).append(message)
Thanks to @raymondh for the hint in is PyCon talk:
Transforming Code into Beautiful, Idiomatic Python
The Jira Ticket pointed out by llovett has been solved, so now you can use date operators like $isoWeek
and $year
to extract this information from an ObjectId
.
Your aggregation would look something like this:
{
"$project":
{
"_id": {
"$dateFromParts" : {
"year": { "$year": "$_id"},
"month": { "$month": "$_id"},
"day": { "$dayOfMonth": "$_id"}
}
}
}
}