How do Node.js Streams work?

2019-01-22 09:10发布

I have a question about Node.js streams - specifically how they work conceptually.

There is no lack of documentation on how to use streams. But I've had difficulty finding how streams work at the data level.

My limited understanding of web communication, HTTP, is that full "packages" of data are sent back and forth. Similar to an individual ordering a company's catalogue, a client sends a GET (catalogue) request to the server, and the server responds with the catalogue. The browser doesn't receive a page of the catalogue, but the whole book.

Are node streams perhaps multipart messages?

I like the REST model - especially that it is stateless. Every single interaction between the browser and server is completely self contained and sufficient. Are node streams therefore not RESTful? One developer mentioned the similarity with socket pipes, which keep the connection open. Back to my catalogue ordering example, would this be like an infomercial with the line "But wait! There's more!" instead of the fully contained catalogue?

A large part of streams is the ability for the receiver 'down-stream' to send messages like 'pause' & 'continue' upstream. What do these messages consist of? Are they POST?

Finally, my limited visual understanding of how Node works includes this event loop. Functions can be placed on separate threads from the thread pool, and the event loop carries on. But shouldn't sending a stream of data keep the event loop occupied (i.e. stopped) until the stream is complete? How is it ALSO keeping watch for the 'pause' request from downstream?n Does the event loop place the stream on another thread from the pool and when it encounters a 'pause' request, retrieve the relevant thread and pause it?

I've read the node.js docs, completed the nodeschool tutorials, built a heroku app, purchased TWO books (real, self contained, books, kinda like the catalogues spoken before and likely not like node streams), asked several "node" instructors at code bootcamps - all speak about how to use streams but none speak about what's actually happening below.

Perhaps you have come across a good resource explaining how these work? Perhaps a good anthropomorphic analogy for a non CS mind?

3条回答
Ridiculous、
2楼-- · 2019-01-22 09:21

I think you are overthinking how all this works and I like it.

What streams are good for

Streams are good for two things:

  • when an operation is slow and it can give you partials results as it gets them. For example read a file, it is slow because HDDs are slow and it can give you parts of the file as it reads it. With streams you can use these parts of the file and start to process them right away.

  • they are also good to connect programs together (read functions). Just as in the command line you can pipe different programs together to produce the desired output. Example: cat file | grep word.

How they work under the hood...

Most of these operations that take time to process and can give you partial results as it gets them are not done by Node.js they are done by the V8 JS Engine and it only hands those results to JS for you to work with them.

To understand your http example you need to understand how http works

There are different encodings a web page can be send as. In the beginning there was only one way. Where a whole page was sent when it was requested. Now it has more efficient encodings to do this. One of them is chunked where parts of the web page are sent until the whole page is sent. This is good because a web page can be processed as it is received. Imagine a web browser. It can start to render websites before the download is complete.

Your .pause and .continue questions

First, Node.js streams only work within the same Node.js program. Node.js streams can't interact with a stream in another server or even program.

That means that in the example below, Node.js can't talk to the webserver. It can't tell it to pause or resume.

Node.js <-> Network <-> Webserver

What really happens is that Node.js asks for a webpage and it starts to download it and there is no way to stop that download. Just dropping the socket.

So, what really happens when you make in Node.js .pause or .continue?

It starts to buffer the request until you are ready to start to consume it again. But the download never stopped.

Event Loop

I have a whole answer prepared to explain how the Event Loop works but I think it is better for you to watch this talk.

查看更多
男人必须洒脱
3楼-- · 2019-01-22 09:27

The first thing to note is: node.js streams are not limited to HTTP requests. HTTP requests / Network resources are just one example of a stream in node.js.

Streams are useful for everything that can be processed in small chunks. They allow you to process potentially huge resources in smaller chunks that fit into your RAM more easily.

Say you have a file (several gigabytes in size) and want to convert all lowercase into uppercase characters and write the result to another file. The naive approach would read the whole file using fs.readFile (error handling omitted for brevity):

fs.readFile('my_huge_file', function (err, data) {
    var convertedData = data.toString().toUpperCase();

    fs.writeFile('my_converted_file', convertedData);
});

Unfortunately this approch will easily overwhelm your RAM as the whole file has to be stored before processing it. You would also waste precious time waiting for the file to be read. Wouldn't it make sense to process the file in smaller chunks? You could start processing as soon as you get the first bytes while waiting for the hard disk to provide the remaining data:

var readStream = fs.createReadStream('my_huge_file');
var writeStream = fs.createWriteStream('my_converted_file');
readStream.on('data', function (chunk) {
    var convertedChunk = chunk.toString().toUpperCase();
    writeStream.write(convertedChunk);
});
readStream.on('end', function () {
    writeStream.end();
});

This approach is much better:

  1. You will only deal with small parts of data that will easily fit into your RAM.
  2. You start processing once the first byte arrived and don't waste time doing nothing, but waiting.

Once you open the stream node.js will open the file and start reading from it. Once the operating system passes some bytes to the thread that's reading the file it will be passed along to your application.


Coming back to the HTTP streams:

  1. The first issue is valid here as well. It is possible that an attacker sends you large amounts of data to overwhelm your RAM and take down (DoS) your service.
  2. However the second issue is even more important in this case: The network may be very slow (think smartphones) and it may take a long time until everything is sent by the client. By using a stream you can start processing the request and cut response times.

On pausing the HTTP stream: This is not done at the HTTP level, but way lower. If you pause the stream node.js will simply stop reading from the underlying TCP socket. What is happening then is up to the kernel. It may still buffer the incoming data, so it's ready for you once you finished your current work. It may also inform the sender at the TCP level that it should pause sending data. Applications don't need to deal with that. That is none of their business. In fact the sender application probably does not even realize that you are no longer actively reading!

So it's basically about being provided data as soon as it is available, but without overwhelming your resources. The underlying hard work is done either by the operating system (e.g. net, fs, http) or by the author of the stream you are using (e.g. zlib which is a Transform stream and usually bolted onto fs or net).

查看更多
家丑人穷心不美
4楼-- · 2019-01-22 09:35

The below chart seems to be a pretty accurate 10.000 feet overview / diagram for the the node streams class.

It represents streams3, contributed by Chris Dickinson.

enter image description here

查看更多
登录 后发表回答