How to join to two additional collections with con

2019-01-26 17:29发布

问题:

select tb1.*,tb3 from tb1,tb2,tb3
 where tb1.id=tb2.profile_id and tb2.field='<text>'
 and tb3.user_id = tb2.id and tb3.status =0

actually i converted the sql as mongo sql as follows

mongo sql which i used

db.getCollection('tb1').aggregate
([
  { $lookup: 
           { from: 'tb2', 
             localField: 'id', 
             foreignField: 'profile_id', 
             as: 'tb2detail' 
            } 
   },

   { $lookup: 
            { from: 'tb3', 
              localField: 'tb2.id', 
              foreignField: 'user_id', 
              as: 'tb3details' 
            } 
    },

{ $match: 
        { 'status': 
                  { '$ne': 'closed' 
                  }, 
          'tb2.profile_type': 'agent', 
          'tb3.status': 0 
         } 
}

])

but not getting as per the expected result..

Any help will be appreciated..

回答1:

What you are missing in here is that $lookup produces an "array" in the output field specified by as in it's arguments. This is the general concept of MongoDB "relations", in that a "relation" between documents is represented as a "sub-property" that is "within" the document itself, being either singular or an "array" for many.

Since MongoDB is "schemaless", the general presumption of $lookup is that you mean "many" and the result is therefore "always" an array. So looking for the "same result as in SQL" then you need to $unwind that array after the $lookup. Whether it's "one" or "many" is of no consequence, since it's still "always" an array:

db.getCollection.('tb1').aggregate([
  // Filter conditions from the source collection
  { "$match": { "status": { "$ne": "closed" } }},

  // Do the first join
  { "$lookup": {
    "from": "tb2",
    "localField": "id",
    "foreignField": "profileId",
    "as": "tb2"
  }},

  // $unwind the array to denormalize
  { "$unwind": "$tb2" },

  // Then match on the condtion for tb2
  { "$match": { "tb2.profile_type": "agent" } },

  // join the second additional collection
  { "$lookup": {
    "from": "tb3",
    "localField": "tb2.id",
    "foreignField": "id",
    "as": "tb3"
  }},

  // $unwind again to de-normalize
  { "$unwind": "$tb3" },

  // Now filter the condition on tb3
  { "$match": { "tb3.status": 0 } },

  // Project only wanted fields. In this case, exclude "tb2"
  { "$project": { "tb2": 0 } }
])

Here you need to note the other things you are missing in the translation:

Sequence is "important"

Aggregation pipelines are more "tersely expressive" than SQL. They are in fact best considered as "a sequence of steps" applied to the datasource in order to collate and transform the data. The best analog to this is "piped" command line instructions, such as:

ps -ef  | grep mongod | grep -v grep | awk '{ print $1 }'

Where the "pipe" | can be considered as a "pipeline stage" in a MongoDB aggregation "pipeline".

As such we want to $match in order to filter things from the "source" collection as our first operation. And this is generally good practice since it removes any documents that did not meet required conditions from further conditions. Just like what is happening in our "command line pipe" example, where we take "input" then "pipe" to a grep to "remove" or "filter".

Paths Matter

Where the very next thing you do here is "join" via $lookup. The result is an "array" of the items from the "from" collection argument matched by the supplied fields to output in the "as" "field path" as an "array".

The naming chosen here is important, since now the "document" from the source collection considers all items from the "join" to now exist at that given path. To make this easy, I use the same "collection" name as the "join" for the new "path".

So starting from the first "join" the output is to "tb2" and that will hold all the results from that collection. There is also an important thing to note with the following sequence of $unwind and then $match, as to how MongoDB actually processes the query.

Certain Sequences "really" matter

Since it "looks like" there are "three" pipeline stages, being $lookup then $unwind and then $match. But in "fact" MongoDB really does something else, which is demonstrated in the output of { "explain": true } added to the .aggregate() command:

    {
        "$lookup" : {
            "from" : "tb2",
            "as" : "tb2",
            "localField" : "id",
            "foreignField" : "profileId",
            "unwinding" : {
                "preserveNullAndEmptyArrays" : false
            },
            "matching" : {
                "profile_type" : {
                    "$eq" : "agent"
                }
            }
        }
    }, 
    {
        "$lookup" : {
            "from" : "tb3",
            "as" : "tb3",
            "localField" : "tb2.id",
            "foreignField" : "id",
            "unwinding" : {
                "preserveNullAndEmptyArrays" : false
            },
            "matching" : {
                "status" : {
                    "$eq" : 0.0
                }
            }
        }
    }, 

So aside from the first point of "sequence" applying where you need to put the $match statements where they are needed and do the "most good", this actually becomes "really important" with the concept of "joins". The thing to note here is that our sequences of $lookup then $unwind and then $match, actually get processed by MongoDB as just the $lookup stages, with the other operations "rolled up" into the one pipeline stage for each.

This is an important distinction to other ways of "filtering" the results obtained by $lookup. Since in this case, the actual "query" conditions on the "join" from $match are performed on the collection to join "before" the results are returned to the parent.

This in combination with $unwind ( which is translated into unwinding ) as shown above is how MongoDB actually deals with the possibility that the "join" could result in producing an array of content in the source document which causes it to exceed the 16MB BSON limit. This would only happen in cases where the result being joined to is very large, but the same advantage is in where the "filter" is actually applied, being on the target collection "before" results are returned.

It is that kind of handling that best "correlates" to the same behavior as a SQL JOIN. It is also therefore the most effective way to obtain results from a $lookup where there are other conditions to apply to the JOIN aside from simply the "local" of "foreign" key values.

Also note that the other behavior change is is from what is essentially a LEFT JOIN performed by $lookup where the "source" document would always be retained regardless of the presence of a matching document in the "target" collection. Instead the $unwind adds to this by "discarding" any results from the "source" which did not have anything matching from the "target" by the additional conditions in $match.

In fact they are even discarded beforehand due to the implied preserveNullAndEmptyArrays: false which is included and would discard anything where the "local" and "foreign" keys did not even match between the two collections. This is a good thing for this particular type of query as the "join" is intended to the "equal" on those values.

Conclude

As noted before, MongoDB generally treats "relations" a lot differently to how you would use a "Relational Database" or RDBMS. The general concept of "relations" is in fact "embedding" the data, either as a single property or as an array.

You may actually desire such output, which is also part of the reason why that without the $unwind sequence here the output of $lookup is actually an "array". However using $unwind in this context is actually the most effective thing to do, as well as giving a guarantee that the "joined" data does not actually cause the aforementioned BSON limit to be exceed as a result of that "join".

If you actually want arrays of output, then the best thing to do here would be to use the $group pipeline stage, and possibly as multiple stages in order to "normalize" and "undo the results" of $unwind

  { "$group": {
    "_id": "$_id",
    "tb1_field": { "$first": "$tb1_field" },
    "tb1_another": { "$first": "$tb1_another" },
    "tb3": { "$push": "$tb3" }    
  }}

Where you would in fact for this case list all the fields you required from "tb1" by their property names using $first to only keep the "first" occurrence ( essentially repeated by results of "tb2" and "tb3" unwound ) and then $push the "detail" from "tb3" into an "array" to represent the relation to "tb1".

But the general form of the aggregation pipeline as given is the exact representation of how results would be obtained from the original SQL, which is "denormalized" output as a result of the "join". Whether you want to "normalize" results again after this is up to you.