Filter $lookup results

2019-05-19 07:50发布

问题:

I have 2 collections (with example documents):

reports

{
    id: "R1",
    type: "xyz",
}

reportfiles

{
    id: "F1",
    reportid: "R1",
    time: ISODate("2016-06-13T14:20:25.812Z")
},
{
    id: "F14",
    reportid: "R1",
    time: ISODate("2016-06-15T09:20:29.809Z")
}

As you can see one report may have multiple reportfiles.

I'd like to perform a query, matching a report id, returning the report document as is, plus an additional key storing as subdocument the reportfile with the most recent time (even better without reportid, as it would be redundant), e.g.

{
    id: "R1",
    type: "xyz",
    reportfile: {
        id: "F14",
        reportid: "R1",
        time: ISODate("2016-06-15T09:20:29.809Z")
    }
}

My problem here is that every report type has its own set of properties, so using $project in an aggregation pipeline is not the best way.

So far I got

db.reports.aggregate([{
            $match : 'R1'
        }, {
            $lookup : {
                from : 'reportfiles',
                localField : 'id',
                foreignField : 'reportid',
                as : 'reportfile'
            }
        }
    ])

returning of course as ´reportfile´ the list of all files with the given reportid. How can I efficiently filter that list to get the only element I need?

efficiently -> I tried using $unwind as next pipeline step but the resulting document was frighteningly and pointlessly long.

Thanks in advance for any suggestion!

回答1:

You need to add another $project stage to your aggregation pipeline after the $lookup stage.

{ "$project": { 
    "id": "R1",
    "type": "xyz",
    "reportfile": {
        "$let": {
            "vars": { 
                "obj": {   
                    "$arrayElemAt": [
                        { "$filter": { 
                            "input": "$reportfile", 
                            "as": "report", 
                            "cond": { "$eq": [ "$$report.time", { "$max": "$reportfile.time" } ] }
                        }},
                        0
                    ]
                }
            },
            "in": { "id": "$$obj.id", "time": "$$obj.time" }
        }
    }
}}

The $filter operator "filter" the $lookup result and return an array with the document that satisfy your condition. The condition here is $eq which return true when the document has the $maximum value.

The $arrayElemAt operator slice the $filter's result and return the element from the array that you then assign to a variable using the $let operator. From there, you can easily access the field you want in your result with the dot notation.



回答2:

What you would require is to run the aggregation operation on the reportfile collection, do the "join" on the reports collection, pipe a $group operation to ordered (with $sort) and flattened documents (with $unwind) from the $lookup pipeline. The preceding result can then be grouped by the reportid and output the desired result using the $first accumulator aoperators.


The following demonstrates this approach:

db.reportfiles.aggregate([
    { "$match": { "reportid": "R1" } },
    {
        "$lookup": {
            "from": 'reports',
            "localField" : 'reportid',
            "foreignField" : 'id',
            "as": 'report'
        }
    },
    { "$unwind": "$report" },
    { "$sort": { "time": -1 } },
    {
        "$group": {
            "_id": "$reportid",
            "type": { "$first": "$report.type" },
            "reportfile": {
                "$first": {
                    "id": "$id",
                    "reportid": "$reportid",
                    "time": "$time"
                }
            }
        }
    }
])

Sample Output:

{
    "_id" : "R1",
    "type" : "xyz",
    "reportfile" : {
        "id" : "F14",
        "reportid" : "R1",
        "time" : ISODate("2016-06-15T09:20:29.809Z")
    }
}