I have this query in MongoDB:
db.emailGroup.aggregate([
{
"$lookup":
{
"from": "link",
"localField": "_id",
"foreignField": "emailGroupId",
"as": "link"
},
},
{
"$unwind": "$link"
},
{
"$match": {
'link.originalLink': ""
}
},
{
"$group" : {
_id: '$_id',
link: {
$push: '$link'
}
}
},
{
"$project": {
"size": {
"$sum": {
"$map": {
"input": "$link",
"as": "l",
"in": {
"$size": {
"$ifNull": [
"$$l.linkHistory", []
]
}
}
}
}
}
}
}
])
EmailGroup has partId field. I use $lookup to "join" other collection for sum her fields. I need group by partId field and sum custom field "size" for partId groups. Is this possible? Extra question: How can I add emailGroup fields to query result?
Sample documents:
emailGroup:
{
"_id" : ObjectId("594a6c47f51e075db713ccb6"),
"partId" : "f56c7c71eb14a20e6129a667872f9c4f",
}
link:
{
"_id" : ObjectId("594b96d6f51e075db67c44c9"),
"originalLink" : "",
"emailGroupId" : ObjectId("594a6c47f51e075db713ccb6"),
"linkHistory" : [
{
"_id" : ObjectId("594b96f5f51e075db713ccdf"),
},
{
"_id" : ObjectId("594b971bf51e075db67c44ca"),
}
]
}
Annotation for those looking for - Foreign Count
A bit better than was originally answered is to actually use the newer form of $lookup
from MongoDB 3.6. This can actually do the "counting" within the "sub-pipeline" expression as opposed to returning an "array" for subsequent filtering and counting or even using $unwind
db.emailGroup.aggregate([
{ "$lookup": {
"from": "link",
"let": { "id": "$_id" },
"pipeline": [
{ "$match": {
"originalLink": "",
"$expr": { "$eq": [ "$$id", "$_id" ] }
}},
{ "$count": "count" }
],
"as": "linkCount"
}},
{ "$addFields": {
"linkCount": { "$sum": "$linkCount.count" }
}}
])
Not what the original question was asking for but part of the below answer in the now most optimal form, as of course the result of $lookup
is reduced to the "matched count" only instead of "all matched documents".
Original
The correct way to do this would be to add the "linkCount"
to the $group
stage as well as a $first
on any additional fields of the parent document in order to get the "singular" form as was the state "before" the $unwind
was processed on the array that was the result of $lookup
:
All Detail
db.emailGroup.aggregate([
{ "$lookup": {
"from": "link",
"localField": "_id",
"foreignField": "emailGroupId",
"as": "link"
}},
{ "$unwind": "$link" },
{ "$match": { "link.originalLink": "" } },
{ "$group": {
"_id": "$_id",
"partId": { "$first": "$partId" },
"link": { "$push": "$link" },
"linkCount": {
"$sum": {
"$size": {
"$ifNull": [ "$link.linkHistory", [] ]
}
}
}
}}
])
Produces:
{
"_id" : ObjectId("594a6c47f51e075db713ccb6"),
"partId" : "f56c7c71eb14a20e6129a667872f9c4f",
"link" : [
{
"_id" : ObjectId("594b96d6f51e075db67c44c9"),
"originalLink" : "",
"emailGroupId" : ObjectId("594a6c47f51e075db713ccb6"),
"linkHistory" : [
{
"_id" : ObjectId("594b96f5f51e075db713ccdf")
},
{
"_id" : ObjectId("594b971bf51e075db67c44ca")
}
]
}
],
"linkCount" : 2
}
Group By partId
db.emailGroup.aggregate([
{ "$lookup": {
"from": "link",
"localField": "_id",
"foreignField": "emailGroupId",
"as": "link"
}},
{ "$unwind": "$link" },
{ "$match": { "link.originalLink": "" } },
{ "$group": {
"_id": "$partId",
"linkCount": {
"$sum": {
"$size": {
"$ifNull": [ "$link.linkHistory", [] ]
}
}
}
}}
])
Produces
{
"_id" : "f56c7c71eb14a20e6129a667872f9c4f",
"linkCount" : 2
}
The reason you do it this way with a $unwind
and then a $match
is because of how MongoDB actually handles the pipeline when issued in that order. This is what happens to the $lookup
as is demonstrated the the "explain"
output from the operation:
{
"$lookup" : {
"from" : "link",
"as" : "link",
"localField" : "_id",
"foreignField" : "emailGroupId",
"unwinding" : {
"preserveNullAndEmptyArrays" : false
},
"matching" : {
"originalLink" : {
"$eq" : ""
}
}
}
},
{
"$group" : {
I'm leaving the part with $group
in that output to demonstrate that the other two pipeline stages "disappear". This is because they have been "rolled-up" into the $lookup
pipeline stage as shown. This is in fact how MongoDB deals with the possibility that the BSON Limit can be exceeded by the result of "joining" results of $lookup
into an array of the parent document.
You can alternately write the operation like this:
All Detail
db.emailGroup.aggregate([
{ "$lookup": {
"from": "link",
"localField": "_id",
"foreignField": "emailGroupId",
"as": "link"
}},
{ "$addFields": {
"link": {
"$filter": {
"input": "$link",
"as": "l",
"cond": { "$eq": [ "$$l.originalLink", "" ] }
}
},
"linkCount": {
"$sum": {
"$map": {
"input": {
"$filter": {
"input": "$link",
"as": "l",
"cond": { "$eq": [ "$$l.originalLink", "" ] }
}
},
"as": "l",
"in": { "$size": { "$ifNull": [ "$$l.linkHistory", [] ] } }
}
}
}
}}
])
Group by partId
db.emailGroup.aggregate([
{ "$lookup": {
"from": "link",
"localField": "_id",
"foreignField": "emailGroupId",
"as": "link"
}},
{ "$addFields": {
"link": {
"$filter": {
"input": "$link",
"as": "l",
"cond": { "$eq": [ "$$l.originalLink", "" ] }
}
},
"linkCount": {
"$sum": {
"$map": {
"input": {
"$filter": {
"input": "$link",
"as": "l",
"cond": { "$eq": [ "$$l.originalLink", "" ] }
}
},
"as": "l",
"in": { "$size": { "$ifNull": [ "$$l.linkHistory", [] ] } }
}
}
}
}},
{ "$unwind": "$link" },
{ "$group": {
"_id": "$partId",
"linkCount": { "$sum": "$linkCount" }
}}
])
Which has the same output but "differs" from the first query in that the $filter
here is applied "after" ALL results of the $lookup
are returned into the new array of the parent document.
So in performance terms, it is actually more effective to do it the first way as well as being portable to possible large result sets "before filtering" which would otherwise break the 16MB BSON Limit.
Note: If you do not have $addFields
available (added with MongoDB 3.4) then use $project
and specify "ALL" of the fields you wish to return.
As a side-note for those who are interested, in future releases of MongoDB ( presumably 3.6 and up ) you can use $replaceRoot
instead of an $addFields
with usage of the new $mergeObjects
pipeline operator. The advantage of this is as a "block", we can declare the "filtered"
content as a variable via $let
, which means you do not need to write the same $filter
"twice":
db.emailGroup.aggregate([
{ "$lookup": {
"from": "link",
"localField": "_id",
"foreignField": "emailGroupId",
"as": "link"
}},
{ "$replaceRoot": {
"newRoot": {
"$mergeObjects": [
"$$ROOT",
{ "$let": {
"vars": {
"filtered": {
"$filter": {
"input": "$link",
"as": "l",
"cond": { "$eq": [ "$$l.originalLink", "" ] }
}
}
},
"in": {
"link": "$$filtered",
"linkCount": {
"$sum": {
"$map": {
"input": "$$filtered.linkHistory",
"as": "lh",
"in": { "$size": { "$ifNull": [ "$$lh", [] ] } }
}
}
}
}
}}
]
}
}}
])
Nonetheless the best way to do such "filtered" $lookup
operations is "still" at this time using the $unwind
then $match
pattern, until such time as you can provide query arguments to $lookup
directly.
Note: By "directly" here I do not mean the "non-correlated" form of $lookup
which should also be in the MongoDB 3.6 release, as this would indeed issue another "pipeline" execution for each document in the parent collection. So that feature still does not replace the present "best effective" way of retrieving only the matched items.