We do read an XML file (using xml-stream
) with about 500k elements and do insert them into MongoDB like this:
xml.on(`endElement: product`, writeDataToDb.bind(this, "product"));
Insert in writeDataToDb(type, obj)
looks like this:
collection.insertOne(obj, {w: 1, wtimeout: 15000}).catch((e) => { });
Now when the Mongo connection gets disconnected, the xml stream still reads and the console gets flooded with error messages (can't insert, disconnected, EPIPE broken, ...).
In the docs it says:
When you shut down the mongod process, the driver stops processing operations and keeps buffering them due to bufferMaxEntries being -1 by default meaning buffer all operations.
What does this buffer actually do?
We notice when we insert data and close the mongo server, the things get buffered, then we bring the mongo server back up, the native driver successfully reconnects and node resumes inserting data but the buffered documents (during mongo beeing offline) do not get inserted again.
So I question this buffer and its use.
Goal:
We are looking for the best way to keep inserts in buffer until mongo comes back (in 15000milliseconds according to wtimeout
) and let then insert the buffered documents or make use of xml.pause();
and xml.resume()
which we tried without success.
Basically we need a little help in how to handle disconnects without data loss or interrupts.
Inserting 500K elements with insertOne() is a very bad idea. You should instead use bulk operations that allows you to insert many document in a single request. (here for example 10000, so it can be done in 50 single requests) To avoid buffering issue, you can manually handle it:
bufferMaxEntries: 0
reconnectTries: 30, reconnectInterval: 1000
here is a sample script :
sample log output:
I don't know specifically about Mongodb driver and this buffer of entries. Maybe it only keeps data in specific scenarios.
So I will answer to this question with a more general approach that can work with any database.
To summarize, you have two problems:
To handle the first issue, you need to implement a retry algorithm that will ensure that many attempts are made before giving up.
To handle the second issue, you need to implement back pressure on the xml stream. You can do that using the
pause
method, theresume
method and an input buffer.Play with it, put some
console.log
to understand how it behaves. I hope this will help you to solve your issue :)