I am looking to get a random record from a huge (100 million record) mongodb
.
What is the fastest and most efficient way to do so? The data is already there and there are no field in which I can generate a random number and obtain a random row.
Any suggestions?
If you have a simple id key, you could store all the id's in an array, and then pick a random id. (Ruby answer):
Now you can use the aggregate. Example:
See the doc.
Do a count of all records, generate a random number between 0 and the count, and then do:
You can pick a random timestamp and search for the first object that was created afterwards. It will only scan a single document, though it doesn't necessarily give you a uniform distribution.
If you're using mongoid, the document-to-object wrapper, you can do the following in Ruby. (Assuming your model is User)
In my .irbrc, I have
so in rails console, I can do, for example,
to get documents randomly from any collection.
My solution on php: