Coming from an RDBMS background, I was always under the impression "Try as hard as you can to use one query, assuming it's efficient," meaning that it's costly for every request you make to the database. When it comes to MongoDB, it seems like this might not be possible because you can't join tables.
I understand that it's not supposed to be relational, but they're also pushing it for purposes like blogs, forums, and things I'd find an RDBMS easier to approach with.
There are some hang ups I've had trying to understand the efficiency of MongoDB or NoSQL in general. If I wanted to get all "posts" related to certain users (as if they were grouped)... using MySQL I'd probably do some joins and get it with that.
In MongoDB, assuming I need the collections separate, would it be efficient to use a large $in: ['user1', 'user2', 'user3', 'user4', ...] ?
Does that method get slow after a while? If I include 1000 users? And if I needed to get that list of posts related to users X,Y,Z, would it be efficient and/or fast using MongoDB to do:
- Get users array
- Get Posts IN users array
2 queries for one request. Is that bad practice in NoSQL?
To answer the Q about $in....
I did some performance tests with the following scenario:
~24 million docs in a collection
Lookup 1 million of those documents based on a key (indexed)
Using CSharp driver from .NET
Results:
Querying 1 at a time, single threaded : 109s
Querying 1 at a time, multi threaded : 48s
Querying 100K at a time using $in, single threaded=20s
Querying 100K at a time using $in, multi threaded=9s
So noticeably better performance using a large $in (restricted to max query size).
Update: Following on from comments below about how $in performs with different chunk sizes (queries multi-threaded):
Querying 10 at a time (100000 batches) = 8.8s
Querying 100 at a time (10000 batches) = 4.32s
Querying 1000 at a time (1000 batches) = 4.31s
Querying 10000 at a time (100 batches) = 8.4s
Querying 100000 at a time (10 batches) = 9s (per original results above)
So there does look to be a sweet-spot for how many values to batch up in to an $in clause vs. the number of round trips