Having this document structure (omitting irrelevant fields for brevity):
[
{
"_id" : 0,
"partn" : [
{
"date" : ISODate("2015-07-28T00:59:14.963Z"),
"is_partner" : true
},
{
"date" : ISODate("2015-07-28T01:00:32.771Z"),
"is_partner" : false
},
{
"date" : ISODate("2015-07-28T01:15:29.916Z"),
"is_partner" : true
},
{
"date" : ISODate("2015-08-05T13:48:07.035Z"),
"is_partner" : false
},
{
"date" : ISODate("2015-08-05T13:50:56.482Z"),
"is_partner" : true
}
]
},
{
"_id" : 149,
"partn" : [
{
"date" : ISODate("2015-07-30T12:42:18.894Z"),
"is_partner" : true
},
{
"date" : ISODate("2015-07-31T00:01:51.176Z"),
"is_partner" : false
}
]
}
]
I need to filter documents where the last (most recent) partn.is_partner
is true
, is this the best way to do it?
db.somedb
.aggregate([
// pre-filter only the docs with at least one is_partner === true, is it efficient/needed?
{$match: {partn: { $elemMatch: { is_partner: true } } } },
{$unwind: '$partn'},
// do I need to sort by _id too, here?
{$sort: {_id: 1, 'partn.date': 1} },
// then group back fetching the last one by _id
{$group : {
_id : '$_id',
partn: {$last: '$partn'},
}},
// and return only those with is_partner === true
{$match: {'partn.is_partner': true } },
])
I get what I need but, being a not-that-experient mongodb developer something feels like overhead in that aggregation. I thought about just fetching the last entry on each .partn
array, but the collection must be exported/imported sometimes, if I remember right the sort order can be changed - so aggregating and sorting by date could fail-proof that aspect.
Is this the best (most efficient) way to do it? If not, why?
Thanks. (Btw, this is MongoDB 2.6)
Mileage may vary on this and it may well turn out that "currently" the process you are following works out to be "most suited" at least. But we can probably do more efficient.
What you could do now
Provided your arrays are already "sorted" via using the $sort
modifier with $push
, then you can probably do this:
db.somedb.find(
{
"partn.is_partner": true,
"$where": function() {
return this.partn.slice(-1)[0].is_partner == true;
}
},
{ "partn": { "$slice": -1 } }
)
So as long as partn,is_partner
is "indexed" this is still pretty efficient as that initial query condition can be met using an index. The part that cannot is the $where
clause here that uses JavaScript evaluation.
But what that second part in the $where
is doing is simply "slicing" the last element from the array and testing it's value of the is_partner
property to see if it is true. Only if that condition is also met is the document returned.
There is also the $slice
projection operator. This does the same thing in returning the last element from the array. False matches are already filtered, so this is just showing only the last element where true.
Combined with with the index as mentioned, then this should be pretty quick given that the documents have been selected already and the JavaScript condition just filters the rest. Note that without another field with a standard query condition to match, a $where
clause cannot use an index. So always try to use "sparingly" with other query conditions in place.
What you can do in the future
Next Up, while not available at the time of writing, but certainly in the near future will be the $slice
operator for the aggregation framework. This is currently in the develpment branch, but here is a peek at how it works:
db.somedb.aggregate([
{ "$match": { "partn.is_partner": true } },
{ "$redact": {
"$cond": {
"if": {
"$anyElementTrue": {
"$map": {
"input": { "$slice": ["$partn",-1] },
"as": "el",
"in": "$$el.is_partner"
}
}
},
"then": "$$KEEP",
"else": "$$PRUNE"
}
}},
{ "$project": {
"partn": { "$slice": [ "$partn",-1 ] }
}}
])
Combining that $slice
within a $redact
stage here alows the documents to be filtered with a logical condition, testing the document. In this case the $slice
produces a single element array that is sent to $map
in order to just extract the single is_partner
value ( still as an array ). As this is still a single element array at best, the other test is $anyElementTrue
which makes this a singular boolean result, suitable for $cond
.
The $redact
here decides on that result whether to $$KEEP
or $$PRUNE
the document from the results. Later we use $slice
again in the project to just return the last element of the array after the filtering.
That works out to be pretty much exactly what the JavaScript version does, with the exception that this is using all native coded operators, and therefore should be a bit faster than the JavaScript alternate.
Both forms return your first document as expected:
{
"_id" : 0,
"partn" : [
{
"date" : ISODate("2015-07-28T00:59:14.963Z"),
"is_partner" : true
},
{
"date" : ISODate("2015-07-28T01:00:32.771Z"),
"is_partner" : false
},
{
"date" : ISODate("2015-07-28T01:15:29.916Z"),
"is_partner" : true
},
{
"date" : ISODate("2015-08-05T13:48:07.035Z"),
"is_partner" : false
},
{
"date" : ISODate("2015-08-05T13:50:56.482Z"),
"is_partner" : true
}
]
}
The big catch here with both is your array must already be sorted so the latest date is first. Without that, then you need the aggregation framework to $sort
the array, just as you are doing now.
Not really efficient, so that is why you should "pre-sort" your array and maintain the order on each update.
As a handy trick, this will actually re-order all the array elements in all collection documents in one simple statement:
db.somedb.update(
{},
{ "$push": {
"partn": { "$each": [], "$sort": { "date": 1 } }
}},
{ "multi": true }
)
So even if you are not "pushing" a new element into an array and just updating a property, you can always apply that basic construct to keep the array ordered how you want it.
Worth considering as it should make things much faster.