Currently I fetch documents by iterating through cursor in pymongo, for example:
for d in db.docs.find():
mylist.append(d)
For reference, performing a fetchall on the same set of data (7m records) takes around 20 seconds while the method above takes a few minutes.
Is there a faster way read bulk data in mongo? Sorry I'm new to mongo, please let me know if more information is needed.
You only need to make a cast with
list()
functionusing the $natural sort will bypass the index and return the documents in the order in which they are stored on disk, meaning that mongo doesn't have to thrash around with random reads on your disk.
https://docs.mongodb.com/manual/reference/method/cursor.sort/#return-natural-order
The performance becomes severely degraded if you want to use a query. You should never rely on FIFO ordering. Mongo allows itself to move documents around within it's storage layer. If you don't care about the order, so be it.
in python, you also want to use an EXHAUST cursor type that tells the mongo server to stream back the results without waiting for the pymongo driver to acknowledge each batch
https://api.mongodb.com/python/current/api/pymongo/cursor.html#pymongo.cursor.CursorType.EXHAUST
Mind you, it'll never be as fast as the shell. The slowest aspect of moving data between mongo/bson->pymongo->you is UTF8 string decoding within python.