When is Node.js blocking?

I have used Node.js for a while now and I just realized it can be blocking. I just cannot wrap my brain around the conditions under which Node.js becomes blocking.

So, Node.js is single-threaded because (i) Javascript is and (ii) avoids all the multi-threaded pitfalls.
To do a lot of things at once, despite being single-threaded, it implements asynchronous execution. So, talking with the DB (the I/O in general) is non-blocking (because it is asynchronous).
But, all the incoming requests to do some work (i.e. talk with the DB) and all the results of that work that must go back to the client (i.e. send some data) they use that single thread.
Node.js uses the "event loop" inside that single thread to get all the requests and assign them to non-blocking I/O tasks.

So the I/O tasks are non-blocking because of asynchronous callbacks, but the single thread can be blocking, because it's synchronous and because the event loop can be jammed because a lot of complicated requests showing up at the same time?

Am I right, did I understand this correctly? I, guess I don't because here and here they emphasize that "Node is single-threaded which means none of your code runs in parallel". What does that actually mean and how does it make Node blocking?
So, the event loop runs forever and always searches for requests, or it starts execution after it spots a new request?
Does the Node blocking weakness renders Node useless for big projects and make it eventually suitable for only micro-sites and small projects?

Thanks a lot.

标签： node.js multithreading asynchronous event-handling

3条回答

做个烂人

2楼-- · 2019-02-13 14:49

First, to be clear: node.js as a whole isn't single-threaded. Node does have a thread pool via libuv that it uses to perform some tasks that are either currently impossible to do efficiently from a single thread on most platforms (e.g. file I/O) or they are inherently computation intensive (e.g. zlib). It should be noted that most of the crypto module (which would also be inherently computation intensive) currently does not have an async/non-blocking interface (except for crypto.randomBytes()).

v8 also utilizes multiple threads to do things like garbage collection, optimization of functions, etc.

However just about everything else in node does occur within the same, single thread.

Now to address your questions specifically:

The fact that the javascript code is ran from a single thread doesn't make node block. As this answer explains, node is foremost about (I/O) concurrency rather than (code) parallelism. You could run node code in parallel by utilizing the built-in cluster module for example on a multi-core/cpu system, but node's primary goal is to be able to handle a lot of I/O concurrently without dedicating one thread per socket/server/etc.
There is a good, detailed writeup here that describes how the event loop in node works.
Node's primary goal as previously described is to handle I/O really well, which fits with the majority of use cases for web applications and any kind of network programs for example.

If your script is CPU-bound (e.g. you're calculating pi or transcoding audio/video), you are probably better off delegating that work to a child process in node (e.g. calling out to ffmpeg for transcoding instead of doing it in javascript or synchronously in a c++ node addon on node's main thread). You could do these blocking things in-process if you aren't doing anything else at the same time (like handling HTTP requests). There are many people who will use node in this way for performing various utility tasks where I/O concurrency isn't as important. One example of this might be a script that performs minification, linting, and/or bundling of js and css files or a script that creates thumbnails from a large set of images.

However, if your script instead creates a TCP or HTTP server for example that pulls information from a database, formats it, and sends it back to the user, then node will be good at doing that because the majority of the time spent in the process is just waiting for sockets/HTTP clients to send (more) data and waiting for the database to reply with results from queries.

0人赞添加讨论(0) 举报

神经病院院长

3楼-- · 2019-02-13 14:53

Let's get straight to the answers.

Yes, Node.js is fully blocking, if you look at it like that. Let's say you read a huge, half a gigabyte CSV file from the database and try to json-encode it and send to the client (naive, I know, but bear with me).

JSON encoding is basically string manipulation. That can be slow in a lot of languages, not just JavaScript. So if encoding such a json takes 20 seconds, you will load this CSV file (asynchronously), but then you'll spend 20 seconds parsing strings. During that time, nothing else can come in - not other callbacks, not other requests that you can send to the database/file system in the mean time - none of your actual programming runs except that single "JSON.stringify()" function.

There are ways around this particular problem, but you should be aware of it - if your single function or a single statement like JSON.stringify takes a lot, it will block. You need to program your apps with that in mind.

In essence, the event loop is sleeping until there are no tasks on its main queue. There can be other queues, like callback queues where you can be waiting on results of dozens of database operations. But they're not on your main event loop until they actually do get called back by the database reply.

Say you're parsing that JSON from 1) above. And in the meantime you receive 5 new requests for that or other stuff. Your 5 requests go straight to the queue, and as one is finished, event loop checks for the next one to be processed. If there aren't any, it waits.

The blocking does not render node useless. If it did, what would some other single-threaded languages that are not async like node do on large scales?

Node is already used in large scale projects, I'm sure you can find many if you google a bit. The trick is to use the proper tools to proper solutions - as such, Node.js might require different strategies for dealing with CPU-intensive tasks or might even not be the right tool for the job.

0人赞添加讨论(0) 举报

女痞

4楼-- · 2019-02-13 15:03

Let me check if I got this.

Node.js is single-threated, so its code can't run in parallel, but its I/O can be concurrent. We use asynchronous javascript functions for that. So that's why I/O is non-blocking.

Node.js keeps a single thread for your code......however, everything runs in parallel, except your code.

Doing a "sleep" for example will block the server for one second. - single-threated code

All I/O is evented and asynchronous, so the following won't block the server: c.query( 'SELECT SLEEP(20);', .... - the "sleep" is inside an asynchronous function, the query - non-blocking I/O (from here )

To manage the incoming requests, Node implements the "event loop".

An event loop is "an entity that handles and processes external events and converts them into callback invocations". So I/O calls are the points at which Node.js can switch from one request to another. At an I/O call, your code saves the callback and returns control to the node.js runtime environment. The callback will be called later when the data actually is available. (from here)

So I/O is non-blocking because Node can do something else instead of waiting for some I/O to finish.

If a request can take too long to answer, Node will assign to that request a thread from a thread pool.

That Thread is responsible for taking that request, process it, perform Blocking IO operations, prepare response and send it back to the Event Loop Event Loop in turn, sends that Response to the respective Client. (from here )

So, if there a lot of simple requests, the I/O is non-blocking and all the callbacks of these requests answer really fast.

(from this point forward I am not sure I got it correctly)

A lot of simple requests and a complex one. The complex one could be a heavy task, an image-resize algorithm or whatever takes time. Each request is in a asynchronous function , node will define a thread for the complex one. Most of the simple ones will respond immediately. The complex one will take some time, inside its own thread, while the simple ones still responding (cause non-blocking). But the callbacks in the event loop are queued in a specific order (FIFO, right?).

In a loop, the queue is polled for the next message (each poll referred to as a “tick”) and when a message is encountered, the callback for that message is executed. The calling of this callback function serves as the initial frame in the call stack, and due to JavaScript being single-threaded, further message polling and processing is halted pending the return of all calls on the stack. Subsequent (synchronous) function calls add new call frames to the stack (from here)

So the callbacks of the simple requests that come after the complex's callback will take some time to respond, because the complex's callback will take a lot of time.

A lot of complex requests , each inside its own asynchronous function. If each request takes , say, 1 sec to respond and we have 10000 responds, the time sums up. They all eventually sum up in the single-threated node that uses the event loop. Inside the event loop, each callback that takes a lot of time to respond is queued behind another callback that takes a lot of time to respond.
I think the above describes Grant's problem here . That was the first article that I red about node's cons and I still dont know if I got it correctly. So,

Our Node service may have handled incoming requests like champ if all it needed to do was return immediately available data.

But

Node is single-threaded which means none of your code runs in parallel. I/O may not block the server but your code certainly does. If I call sleep for 5 seconds, my server will be unresponsive during that time.

Grant find herself with a lot of requests that took time because an amazon service was slow

...was waiting on a ton of nested callbacks all dependent on responses from S3 (which can be god awful slow at times)

And then the event loop killed everything

In a loop, the queue is polled for the next message (each poll referred to as a “tick”) and when a message is encountered, the callback for that message is executed. The calling of this callback function serves as the initial frame in the call stack, and due to JavaScript being single-threaded, further message polling and processing is halted pending the return of all calls on the stack. Subsequent (synchronous) function calls add new call frames to the stack..... when any request timeouts happened, the event and its associated callback was put on an already overloaded message queue. While the timeout event might occur at 1 second, the callback wasn’t getting processed until all other messages currently on the queue, and their corresponding callback code, were finished executing (potentially seconds later).

I dont know if I got this correctly. Please, feel free to point out my errors and help me get the whole thing right.

Thanks

0人赞添加讨论(0) 举报

When is Node.js blocking?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间