MongoDB aggregate group on inner child collection

2019-08-26 00:59发布

问题:

I have a User collection, which further have a 'UserSubscription' collection, which further have 'Subscription > Publication'.

Mongo collection looks like this

/* 1 */
{
    "_id" : 1,
    "UserSubscriptions" : [ 
    {
        "_id" : 1,
        "Subscription" : {
            "_id" : 1,
            "Publication" : {
                "_id" : 1,
                "Code" : "1MM",
            },
    },
    {
        "_id" : 2,
        "Subscription" : {
            "_id" : 2,
            "Publication" : {
                "_id" : 2,
                "Code" : "2MM",
            },      
    },
    {
        "_id" : 7,
        "Subscription" : {
            "_id" : 7,
            "Publication" : {
                "_id" : 1,
                "Code" : "1MM",
            },      
    }
]
}

/* 2 */
{
    "_id" : 2,
    "UserSubscriptions" : [ 
    {
        "_id" : 3,
        "Subscription" : {
            "_id" : 3,
            "Publication" : {
                "_id" : 1,
                "Code" : "1MM",
            }      
        }
    ]
}

/* 3 */
{
    "_id" : 3,
    "UserSubscriptions" : [ 
    {
        "_id" : 4,
        "Subscription" : {
            "_id" : 4,
            "Publication" : {
                "_id" : 1,
                "Code" : "1MM",
            }      
        }
    ]
}

/* 4 */
{
    "_id" : 4,
    "UserSubscriptions" : [ 
    {
        "_id" : 5,
        "Subscription" : {
            "_id" : 5,
            "Publication" : {
                "_id" : 2,
                "Code" : "2MM",
            }      
       }
   ]
}

I'm trying to get all 'Publications' and the Count (User subscribed to Subscription count). So from above collection i want result like this

 PublicationCode      Count (Number of users)
 1MM                  3
 2MM                  2 

I have tried following MongoDB query and getting result

db.runCommand( {
   aggregate: "User",
   pipeline: [
     {$unwind: '$UserSubscriptions'},
     {$group: {_id: '$_id',pub: {$addToSet: 
                '$UserSubscriptions.Subscription.Publication'}}},
     {$unwind: '$pub'},
     {$group: {_id: '$pub.Code',pub:{$first:'$pub'}, count: {$sum: 1}}},
     {$project:{_id:0,"Publication":"$pub","count":1}}
 ]} )

Corresponding C# Mongo driver code is

 var unwind = new BsonDocument { { "$unwind", "$UserSubscriptions" } };
        var group1 = new BsonDocument
            {
                { "$group",
                    new BsonDocument
                        {
                            { "_id", "$_id"
                            },
                            {
                                "publications", new BsonDocument
                                             {
                                                 {
                                                     "$addToSet","$UserSubscriptions.Subscription.Publication"
                                                 }
                                             }
                            }
                        }
              }
            };
        var unwindCode = new BsonDocument { { "$unwind", "$publications" } };
        var group2 = new BsonDocument
            {
                { "$group",
                    new BsonDocument
                        {
                            { "_id", "$publications.Code"
                            },
                            {
                                "Publications", new BsonDocument
                                             {
                                                 {
                                                     "$first","$publications"
                                                 }
                                             }
                            },
                            {
                                "NumberOfUsers", new BsonDocument
                                             {
                                                 {
                                                     "$sum",1
                                                 }
                                             }
                            }
                        }
              }
            };

        var project = new BsonDocument
        {
            {
                "$project",
                new BsonDocument
                    {
                        {"_id", 0},
                        {"Publication","$Publications"},
                        {"NumberOfUsers", 1},
                    }
            }
        };

        var pipeline = new[] { unwind, group1, unwindCode, group2, project };
        List<BsonDocument> docs= coll.Aggregate<BsonDocument>(pipeline);

Resulted 'Count' is coming correct with 'Publication' document. But the query is taking some time to return the result. For 50K Users records it is taking around 23 secs.

Please advice some way to improve the performance of the MongoDB query

(Robo 3T 1.2.1) (MongoDB.Driver 2.4.4)

回答1:

The main problem is that you need to go through whole collection (in this situation you can't use the indexes for the process speed up).

You can try this code (it have less operation that your):

db.getCollection('User').aggregate([
    {
        $unwind: "$UserSubscriptions"
    },
    {
        $group: {
            _id:"$UserSubscriptions.Subscription.Publication.Code",
            users: {$addToSet: "$_id"}
        }
    },
    {
        $project: {"PublicationCode": "$_id", "Count": {$size: "$users"}}
    }
])