I'm trying to understand what makes Nginx so fast, and I have a few questions.
As I understand it, Apache either spawns a new process to serve each request OR spawns a new thread to serve each request. Since each new thread shares virtual address space the memory usage keeps climbs if there are a number of concurrent requests coming in.
Nginx solves this by having just one listening process(Master), with a single execution thread AND 2 or 3(number is configurable) worker processes. This Master process/thread is running an event loop. Effectively waiting for any incoming request. When a request comes in it gives that request to one of the worker processes.
Please correct me if my above understanding is not correct
If the above is correct, then I have a few questions:
1.) Isn't the worker process going to spawn multiple threads and going to run into the same problem as apache ?
2.) Or is nginx fast because its event based architecture uses nonblocking-IO underneath it all. Maybe the worker process spawns threads which do only non-blocking-IO, is that it ?
3.) What "exactly" is "event based architecture", can someone really simplify it, for soemone like me to understand. Does it just pertain to non-blocking-io or something else as well ?
I got a reference of c10k, I am trying to go through it, but I don't think its about event based arch. it seems more for nonblocking IO.
Apache doesn't spawn a new thread for each request. It maintains a cache of threads or a group of pre-forked processes which it farms out requests to. The number of concurrent requests are limited by the number of children/threads yes, but apache is not spawning a new thread/child for every request which would be ridiculously slow (even with threads, creation and teardown for every request would be way too slow)
Nginx uses a master-worker model. The master process deals with loading the configuration and creating/destroying/maintaining workers. Like apache it starts out with a number of pre-forked processes already running each of which is a worker (and one of which is the "master" process). EACH worker process share a set of listening sockets. Each worker process accepts connections and processes them, but each worker can handle THOUSANDS of connections at once, unlike apache which can only handle 1 connection per worker.
The way nginx achieves this is through "multiplexing". It doesn't use libevent, it uses a custom event loop which was designed specifically for nginx and grew in development with the development of the nginx software. Multiplexing works by using a loop to "increment" through a program chunk by chunk operating on one piece of data/new connection/whatever per connection/object per loop iteration. It is all based on backends like Epoll() kqueue() and select(). Which you should read up on
Apache uses multiple threads to provide each request with it's own thread of execution. This is necessary to avoid blocking when using synchronous I/O.
Nginx uses only asynchronous I/O, which makes blocking a non-issue. The only reason nginx uses multiple processes, is to make full use of multi-core, multi-CPU and hyper-threading systems. Even with SMP support, the kernel cannot schedule a single thread of execution over multiple CPUs. It requires at least one process or thread per logical CPU.
So the difference is, nginx requires only enough worker processes to get the full benefit of SMP, whereas Apache's architecture necessitates creating a new thread (each with it's own stack of around ~8MB) per request. Obviously, at high concurrency, Apache will use much more memory and suffer greater overhead from maintaining large numbers of threads.
It's not very complicated from a conceptual point of view. I'll try to be clear but I have to do some simplification.
The event based servers (like nginx and lighttpd) use a wrapper around an event monitoring system. For example. lighttpd uses libevent to abstract the more advanced high-speed event monitoring system (see libev also).
The server keeps track of all the non blocking connections it has (both writing and reading) using a simple state machine for each connection. The event monitoring system notifies the server process when there is new data available or when it can write more data. It's like a
select()
on steroids, if you know socket programming. The server process then simply sends the requested file using some advanced function likesendfile()
where possible or turns the request to a CGI process using a socket for communication (this socket will be monitored with the event monitoring system like the other network connections.)This link as a lot of great information about the internals of nginx, just in case. I hope it helps.