We are troubled by eventually occurring cursor not found exceptions
for some Morphia Queries asList
and I've found a hint on SO, that this might be quite memory consumptive.
Now I'd like to know a bit more about the background: can sombody explain (in English), what a Cursor (in MongoDB) actually is? Why can it kept open or be not found?
The documentation defines a cursor as:
A pointer to the result set of a query. Clients can iterate through a cursor to retrieve results. By default, cursors timeout after 10 minutes of inactivity
But this is not very telling. Maybe it could be helpful to define a batch
for query results, because the documentation also states:
The MongoDB server returns the query results in batches. Batch size will not exceed the maximum BSON document size. For most queries, the first batch returns 101 documents or just enough documents to exceed 1 megabyte. Subsequent batch size is 4 megabytes. [...] For queries that include a sort operation without an index, the server must load all the documents in memory to perform the sort before returning any results.
Note: in our queries in question we don't use sort statements at all, but also no limit
and offset
.
I am by no mean a mongodb expert but I just want to add some observations from working in a medium sized mongo system for the last year. Also thanks to @xameeramir for the excellent walkthough about how cursors work in general.
The causes of a "cursor lost" exception may be several. One that I have noticed is explained in this answer.
The cursor lives server side. It is not distributed over a replica set but exists on the instance that is primary at the time of creation. This means that if another instance takes over as primary the cursor will be lost to the client. If the old primary is still up and around it may still be there but for no use. I guess it is garbaged collected away after a while. So if your mongo replica set is unstable or you have a shaky network in front of it you are out of luck when doing any long running queries.
If the full content of what the cursor wants to return does not fit in memory on the server the query may be very slow. RAM on your servers needs to be larger than the largest query you run.
All this can partly be avoided by designing better. For a use case with large long running queries you may be better of with several smaller database collections instead of a big one.
find() and Cursors in the Node.js Driver
Notice that the call
.toArray
is making the application to fetch the entire dataset.Blockquote
Notice that the cursor returned by the
find()
is assigned tovar cursor
. With this approach, instead of fetching all data in memort and consuming data at once, we're streaming the data to our application.find()
can create a cursor immediately because it doesn't actually make a request to the database until we try to use some of the documents it will provide. The point ofcursor
is to describe our query. The 2nd parameter tocursor.forEach
shows what to do when the driver gets exhausted or an error occurs.In the initial version of the above code, it was
toArray()
which forced the database call. It meant we needed ALL the documents and wanted them to be in anarray
.Also,
MongoDB
returns data in batch format. The image below shows, requests from cursors (from application) toMongoDB
forEach
is better thantoArray
because we can process documents as they come in until we reach the end. Contrast it withtoArray
- where we wait for ALL the documents to be retrieved and the entire array is built. This means we're not getting any advantage from the fact that the driver and the database system are working together to batch results to your application. Batching is meant to provide efficiency in terms of memory overhead and the execution time. Take advantage of it, if you can in your application.