I have used Node.js for a while now and I just realized it can be blocking. I just cannot wrap my brain around the conditions under which Node.js becomes blocking.
- So, Node.js is single-threaded because (i) Javascript is and (ii) avoids all the multi-threaded pitfalls.
- To do a lot of things at once, despite being single-threaded, it implements asynchronous execution. So, talking with the DB (the I/O in general) is non-blocking (because it is asynchronous).
- But, all the incoming requests to do some work (i.e. talk with the DB) and all the results of that work that must go back to the client (i.e. send some data) they use that single thread.
- Node.js uses the "event loop" inside that single thread to get all the requests and assign them to non-blocking I/O tasks.
So the I/O tasks are non-blocking because of asynchronous callbacks, but the single thread can be blocking, because it's synchronous and because the event loop can be jammed because a lot of complicated requests showing up at the same time?
- Am I right, did I understand this correctly? I, guess I don't because here and here they emphasize that "Node is single-threaded which means none of your code runs in parallel". What does that actually mean and how does it make Node blocking?
- So, the event loop runs forever and always searches for requests, or it starts execution after it spots a new request?
- Does the Node blocking weakness renders Node useless for big projects and make it eventually suitable for only micro-sites and small projects?
Thanks a lot.
First, to be clear: node.js as a whole isn't single-threaded. Node does have a thread pool via libuv that it uses to perform some tasks that are either currently impossible to do efficiently from a single thread on most platforms (e.g. file I/O) or they are inherently computation intensive (e.g. zlib). It should be noted that most of the
crypto
module (which would also be inherently computation intensive) currently does not have an async/non-blocking interface (except forcrypto.randomBytes()
).v8 also utilizes multiple threads to do things like garbage collection, optimization of functions, etc.
However just about everything else in node does occur within the same, single thread.
Now to address your questions specifically:
The fact that the javascript code is ran from a single thread doesn't make node block. As this answer explains, node is foremost about (I/O) concurrency rather than (code) parallelism. You could run node code in parallel by utilizing the built-in
cluster
module for example on a multi-core/cpu system, but node's primary goal is to be able to handle a lot of I/O concurrently without dedicating one thread per socket/server/etc.There is a good, detailed writeup here that describes how the event loop in node works.
Node's primary goal as previously described is to handle I/O really well, which fits with the majority of use cases for web applications and any kind of network programs for example.
If your script is CPU-bound (e.g. you're calculating pi or transcoding audio/video), you are probably better off delegating that work to a child process in node (e.g. calling out to
ffmpeg
for transcoding instead of doing it in javascript or synchronously in a c++ node addon on node's main thread). You could do these blocking things in-process if you aren't doing anything else at the same time (like handling HTTP requests). There are many people who will use node in this way for performing various utility tasks where I/O concurrency isn't as important. One example of this might be a script that performs minification, linting, and/or bundling of js and css files or a script that creates thumbnails from a large set of images.However, if your script instead creates a TCP or HTTP server for example that pulls information from a database, formats it, and sends it back to the user, then node will be good at doing that because the majority of the time spent in the process is just waiting for sockets/HTTP clients to send (more) data and waiting for the database to reply with results from queries.
Let's get straight to the answers.
JSON encoding is basically string manipulation. That can be slow in a lot of languages, not just JavaScript. So if encoding such a json takes 20 seconds, you will load this CSV file (asynchronously), but then you'll spend 20 seconds parsing strings. During that time, nothing else can come in - not other callbacks, not other requests that you can send to the database/file system in the mean time - none of your actual programming runs except that single "JSON.stringify()" function.
There are ways around this particular problem, but you should be aware of it - if your single function or a single statement like JSON.stringify takes a lot, it will block. You need to program your apps with that in mind.
Say you're parsing that JSON from 1) above. And in the meantime you receive 5 new requests for that or other stuff. Your 5 requests go straight to the queue, and as one is finished, event loop checks for the next one to be processed. If there aren't any, it waits.
Node is already used in large scale projects, I'm sure you can find many if you google a bit. The trick is to use the proper tools to proper solutions - as such, Node.js might require different strategies for dealing with CPU-intensive tasks or might even not be the right tool for the job.
Let me check if I got this.
Node.js is single-threated, so its code can't run in parallel, but its I/O can be concurrent. We use asynchronous javascript functions for that. So that's why I/O is non-blocking.
To manage the incoming requests, Node implements the "event loop".
So I/O is non-blocking because Node can do something else instead of waiting for some I/O to finish.
If a request can take too long to answer, Node will assign to that request a thread from a thread pool.
(from this point forward I am not sure I got it correctly)
So the callbacks of the simple requests that come after the complex's callback will take some time to respond, because the complex's callback will take a lot of time.
A lot of complex requests , each inside its own asynchronous function. If each request takes , say, 1 sec to respond and we have 10000 responds, the time sums up. They all eventually sum up in the single-threated node that uses the event loop. Inside the event loop, each callback that takes a lot of time to respond is queued behind another callback that takes a lot of time to respond.
I think the above describes Grant's problem here . That was the first article that I red about node's cons and I still dont know if I got it correctly. So,
But
Grant find herself with a lot of requests that took time because an amazon service was slow
And then the event loop killed everything
I dont know if I got this correctly. Please, feel free to point out my errors and help me get the whole thing right.
Thanks