Is MongoDB capable of funding number of random documents without making multiple queries?
e.g. I implemented on the JS side after loading all the document in the collection, which is wasteful - hence just wanted to check if this can be done better with one db query?
The path I took on the JS side:
- get all data
- make an array of the IDs
- shuffle array of IDs (random order)
- splice the array to the number of document required
- create a list of document by selecting them by ID which we have left after two previous operations, one by one from the whole collection
Two major drawback are that I am loading all data - or I make multiple queries.
Any suggestion much appreciated
Here is what I came up in the end:
This was answered long time ago and, since then, MongoDB has greatly evolved.
As posted in another answer, MongoDB now supports sampling within the Aggregation Framework since version 3.2:
The way you could do this is:
Or:
However, there are some warnings about the $sample operator:
(as of Nov, 6h 2017, where latest version is 3.4) => If any of this is not met:
Like in the last example with the $match
OLD ANSWER
You could always run:
But the order won't be random and you will need two queries (one count to get YOUR_COLLECTION_SIZE) or estimate how big it is (it is about 100 records, about 1000, about 10000...)
You could also add a field to all documents with a random number and query by that number. The drawback here would be that you will get the same results every time you run the same query. To fix that you can always play with limit and skip or even with sort. you could as well update those random numbers every time you fetch a record (implies more queries).
--I don't know if you are using Mongoose, Mondoid or directly the Mongo Driver for any specific language, so I'll write all about mongo shell.
Thus your, let's say, product record would look like this:
and I would suggest to use:
Then you could do:
then, you could run periodically so you update the document's _random_sample field periodically:
or just whenever you retrieve some records you could update all of them or just a few (depending on how many records you've retrieved)
EDIT
Be aware that
won't work very well as it will update every products that matches your query with the same random number. The last approach works better (updating some documents as you retrieve them)
skip didn't work out for me. Here is what I wound up with:
gets a single random result, matching the criteria.
Since 3.2 there is an easier way to to get a random sample of documents from a collection:
Source: MongoDB Docs
In this case: