Intro
These are my first adventures in writing node.js server side. It's been fun so far but I'm having some difficulty understanding the proper way to implement something as it relates to node.js streams.
Problem
For test and learning purposes I'm working with large files whose content is zlib compressed. The compressed content is binary data, each packet being 38 bytes in length. I'm trying to create a resulting file that looks almost identical to the original file except that there is an uncompressed 31 byte header for every 1024 38 byte packets.
original file content (decompressed)
+----------+----------+----------+----------+
| packet 1 | packet 2 | ...... | packet N |
| 38 bytes | 38 bytes | ...... | 38 bytes |
+----------+----------+----------+----------+
resulting file content
+----------+--------------------------------+----------+--------------------------------+
| header 1 | 1024 38 byte packets | header 2 | 1024 38 byte packets |
| 31 bytes | zlib compressed | 31 bytes | zlib compressed |
+----------+--------------------------------+----------+--------------------------------+
As you can see, it's somewhat of a translation problem. Meaning, I'm taking some source stream as input and then slightly transforming it into some output stream. Therefore, it felt natural to implement a Transform stream.
The class simply attempts to accomplish the following:
- Takes stream as input
- zlib inflates the chunks of data to count the number of packets, putting together 1024 of them, zlib deflating, and prepending a header.
- Passes the new resulting chunk on through the pipeline via
this.push(chunk)
.
A use case would be something like:
var fs = require('fs');
var me = require('./me'); // Where my Transform stream code sits
var inp = fs.createReadStream('depth_1000000');
var out = fs.createWriteStream('depth_1000000.out');
inp.pipe(me.createMyTranslate()).pipe(out);
Question(s)
Assuming Transform is a good choice for this use case, I seem to be
running into a possible back-pressure issue. My call to this.push(chunk)
within _transform
keeps returning false
. Why would this be and how
to handle such things?
Mike Lippert's answer is the closest to the truth, I think. It appears that waiting for a new
_read()
call to begin again from the reading stream is the only way that theTransform
is actively notified that the reader is ready. I wanted to share a simple example of how I override_read()
temporarily.I ended up following Ledion's example and created a utility Transform class which assists with backpressure. The utility adds an async method named addData, which the implementing Transform can await.
Using this utility class, my Transforms look like this now:
I think
Transform
is suitable for this, but I would perform the inflate as a separate step in the pipeline.Here's a quick and largely untested example:
Ran into a similar problem lately, needing to handle backpressure in an inflating transform stream - the secret to handling
push()
returning false is to register and handle the'drain'
event on the streamNOTE this is a bit hacky as we're reaching into the internals and
pipes
can even be an array ofReadable
s but it does work in the common case of....pipe(transform).pipe(...
Would be great if someone from the Node community can suggest a "correct" method for handling
.push()
returning falseThis question from 2013 is all I was able to find on how to deal with "back pressure" when creating node Transform streams.
From the node 7.10.0 Transform stream and Readable stream documentation what I gathered was that once
push
returned false, nothing else should be pushed until_read
was called.The Transform documentation doesn't mention
_read
except to mention that the base Transform class implements it (and _write). I found the information aboutpush
returning false and_read
being called in the Readable stream documentation.The only other authoritative comment I found on Transform back pressure only mentioned it as an issue, and that was in a comment at the top of the node file _stream_transform.js.
Here's the section about back pressure from that comment:
Solution example
Here's the solution I pieced together to handle the back pressure in a Transform stream which I'm pretty sure works. (I haven't written any real tests, which would require writing a Writable stream to control the back pressure.)
This is a rudimentary Line transform which needs work as a line transform but does demonstrate handling the "back pressure".
I tested the above by running it with the DEBUG lines uncommented on a ~10000 line ~200KB file. Redirect stdout or stderr to a file (or both) to separate the debugging statements from the expected output. (
node test.js > out.log 2> err.log
)Helpful debugging hint
While writing this initially I didn't realize that
_read
could be called before_transform
returned, so I hadn't implemented thethis._transforming
guard and I was getting the following error:Looking at the node implementation I realized that this error meant that the callback given to
_transform
was being called more than once. There wasn't much information to be found about this error either so I thought I'd include what I figured out here.push
will return false if the stream you are writing to (in this case, a file output stream) has too much data buffered. Since you're writing to disk, this makes sense: you are processing data faster than you can write it out.When
out
's buffer is full, your transform stream will fail to push, and start buffering data itself. If that buffer should fill, theninp
's will start to fill. This is how things should be working. The piped streams are only going to process data as fast as the slowest link in the chain can handle it (once your buffers are full).