I'm writing a large file with node.js using a writable stream:
var fs = require('fs');
var stream = fs.createWriteStream('someFile.txt', { flags : 'w' });
var lines;
while (lines = getLines()) {
for (var i = 0; i < lines.length; i++) {
stream.write( lines[i] );
}
}
I'm wondering if this scheme is safe without using drain
event? If it is not (which I think is the case), what is the pattern for writing an arbitrary large data to a file?
I found streams to be a poor performing way to deal with large files - this is because you cannot set an adequate input buffer size (at least I'm not aware of a good way to do it). This is what I do:
Several suggested answers to this question have missed the point about streams altogether.
This module can help https://www.npmjs.org/package/JSONStream
However, lets suppose the situation as described and write the code ourselves. You are reading from a MongoDB as a stream, with ObjectMode = true by default.
This will lead to issues if you try to directly stream to file - something like "Invalid non-string/buffer chunk" error.
The solution to this type of problem is very simple.
Just put another Transform in between the readable and writeable to adapt the Object readable to a String writeable appropriately.
Sample Code Solution:
The idea behind drain is that you would use it to test here:
which you're not. So you would need to rearchitect to make it "reentrant".
However, does this mean that you need to keep buffering getLines as well while you wait?
The cleanest way to handle this is to make your line generator a readable stream - let's call it
lineReader
. Then the following would automatically handle the buffers and draining nicely for you:If you don't want to make a readable stream, you can listen to
write
's output for buffer-fullness and respond like this:A longer example of this situation can be found here.
That's how I finally did it. The idea behind is to create readable stream implementing ReadStream interface and then use
pipe()
method to pipe data to writable stream.The example of
MyReadStream
class can be taken from mongoose QueryStream.If you do not happen to have an input stream you cannot easily use pipe. None of the above worked for me, the drain event doesn't fire. Solved as follows (based on Tylers answer):