I have a huge collection where each document has subdocuments that have relations among them. My schema looks like this:
{
userName: "user44",
userID: "44",
posts : [
...
{
title : "post1",
id : "123"
...
},
{
title : "post2",
id : "124"
...
},
...
],
comments: [
...
{
id: 1910,
postId : "123",
title : "comment1",
comment : "some comment",
user: "user13"
},
{
id: 1911,
postId : "124",
title : "comment2",
comment : "some comment",
user: "user22"
},
...
],
commentUpvotes: [
...
{
id : 12,
commentId : "1910",
upvotedBy: "user91"
},
{
id: 13,
commentId : "1910",
upvotedBy: "user92"
},
...
]
}
Although this has nothing to do with my database, original schema is exactly as above. So, this example above is a user collection, where I store posts
of the user; comments
that made to the posts by other users, commentUpvotes
to store information about who upvoted. Don't think about the logic of its design, and don't please suggest any other schema.
Question: db.users.find({"commentUpvotes.id" : 12})
should return this collection, but only with the comment(1910) and post(123) that this upvote made to. I solved it with $unwinding which caused performance problems. Therefore please suggest to solve it without unwinding. Any ideas on that?
Considering the "indentation" I am using in the listing, this may actually look longer than what you are doing, but really it isn't.
This is a another really good example of using $map
as available to MongoDB 2.6 and greater. There is still some use of $unwind
, but the arrays being "unwound" actually only ever have one element in them. So please forgive my "Highlander" references which I could not resist :)
db.users.aggregate([
// Match your document or documents
{ "$match": {
"commentUpvotes.id": 12
}},
// Get the one "up-votes" entry that matches
{ "$project": {
"posts": 1,
"comments": 1,
"commentUpVotes": {
"$setDifference": [
{
"$map": {
"input": "$commentUpvotes",
"as": "el",
"in": {
"$cond": [
{ "$eq": [ "$$el.id", 12 ] },
"$$el",
false
]
}
}
},
[false]
]
}
}},
// There is only one!
{ "$unwind": "$commentUpVotes" },
// Get the one comments entry that matches
{ "$project": {
"posts": 1,
"comments": {
"$setDifference": [
{
"$map": {
"input": "$comments",
"as": "el",
"in": {
"$cond": [
{
"$eq": [
{ "$substr": [ "$$el.id", 0, 4 ] },
"$commentUpVotes.commentId"
]
},
"$$el",
false
]
}
}
},
[false]
]
},
"commentUpVotes": 1
}},
// And there is only one!
{ "$unwind": "$comments" },
// Get the one post that matches
{ "$project": {
"posts": {
"$setDifference": [
{
"$map": {
"input": "$posts",
"as": "el",
"in": {
"$cond": [
{
"$eq": [
"$$el.id",
"$comments.postId"
]
},
"$$el",
false
]
}
}
},
[false]
]
},
"comments": 1,
"commentUpVotes": 1
}},
// Optionally group back to arrays. There can be only one!
{ "$group": {
"_id": "$_id",
"posts": { "$first": "$posts" },
"comments": { "$push": "$comments" },
"commentUpVotes": { "$push": "$commentUpVotes" }
}}
])
So the end result would be:
{
"_id" : ObjectId("539065d3cd0f2aac5f55778e"),
"posts" : [
{
"title" : "post1",
"id" : "123"
}
],
"comments" : [
{
"id" : 1910,
"postId" : "123",
"title" : "comment1",
"comment" : "some comment",
"user" : "user13"
}
],
"commentUpVotes" : [
{
"id" : 12,
"commentId" : "1910",
"upvotedBy" : "user91"
}
]
}
I know you asked for "no schema changes", but not really a schema change to say it is a good idea to keep your id
values here of a consistent type. Currently you are mixing Integers and strings in this process ( I hope it's just an example ) which is not a good idea.
Thus there is some "limited casting" that actually is available employed here using $substr
, however your actual solution may vary in how to really do this. I strongly suggest fixing the data if it really does need fixing.
At any rate, a pretty cool usage for $map