I have a collection that is log of activity on objects like this:
{
"_id" : ObjectId("55e3fd1d7cb5ac9a458b4567"),
"object_id" : "1",
"activity" : [
{
"action" : "test_action",
"time" : ISODate("2015-08-31T00:00:00.000Z")
},
{
"action" : "test_action",
"time" : ISODate("2015-08-31T00:00:22.000Z")
}
]
}
{
"_id" : ObjectId("55e3fd127cb5ac77478b4567"),
"object_id" : "2",
"activity" : [
{
"action" : "test_action",
"time" : ISODate("2015-08-31T00:00:00.000Z")
}
]
}
{
"_id" : ObjectId("55e3fd0f7cb5ac9f458b4567"),
"object_id" : "1",
"activity" : [
{
"action" : "test_action",
"time" : ISODate("2015-08-30T00:00:00.000Z")
}
]
}
If i do followoing query:
db.objects.find({
"createddate": {$gte : ISODate("2015-08-30T00:00:00.000Z")},
"activity.action" : "test_action"}
}).count()
it returns count of documents containing "test_action" (3 in this set), but i need to get count of all test_actions (4 on this set). How do i do that?
The most "performant" way to do this is to skip the $unwind
altogther and simply $group
to count. Essentially "filter" arrays get the $size
of the results to $sum
:
db.objects.aggregate([
{ "$match": {
"createddate": {
"$gte": ISODate("2015-08-30T00:00:00.000Z")
},
"activity.action": "test_action"
}},
{ "$group": {
"_id": null,
"count": {
"$sum": {
"$size": {
"$setDifference": [
{ "$map": {
"input": "$activity",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.action", "test_action" ] },
"$$el",
false
]
}
}},
[false]
]
}
}
}
}}
])
Future releases of MongoDB will have $filter
, which makes this much more simple:
db.objects.aggregate([
{ "$match": {
"createddate": {
"$gte": ISODate("2015-08-30T00:00:00.000Z")
},
"activity.action": "test_action"
}},
{ "$group": {
"_id": null,
"count": {
"$sum": {
"$size": {
"$filter": {
"input": "$activity",
"as": "el",
"cond": {
"$eq": [ "$$el.action", "test_action" ]
}
}
}
}
}
}}
])
Using $unwind
causes the documents to de-normalize and effectively creates a copy per array entry. Where possible you should avoid this due the the often extreme cost. Filtering and counting array entries per document is much faster by comparison. As is a simple $match
and $group
pipeline compared to many stages.
You can do so by using aggregation:
db.objects.aggregate([
{$match: {"createddate": {$gte : ISODate("2015-08-30T00:00:00.000Z")}, {"activity.action" : "test_action"}}},
{$unwind: "$activity"},
{$match: {"activity.action" : "test_action"}}},
{$group: {_id: null, count: {$sum: 1}}}
])
This will produce a result like:
{
count: 4
}