I ask this question after trying my best to research the best way to implement a message queue server. Why do operating systems put limits on the number of open file descriptors a process and the global system can have? My current server implementation uses zeromq, and opens a subscriber socket for each connected websocket client. Obviously that single process is only going to be able to handle clients to the limit of the fds. When I research the topic I find lots of info on how to raise system limits to levels as high as 64k fds but it never mentions how it affects system performance and why it is 1k and lower to start with? My current approach is to try and dispatch messaging to all clients using a coroutine in its own loop, and a map of all clients and their subscription channels. But I would just love to hear a solid answer about file descriptor limitations and how they affect applications that try to use them on a per client level with persistent connections?
相关问题
- Netty websocket client does not read new frames fr
- Why should we check WIFEXITED after wait in order
- boost::process system leaking file descriptors
- Linux kernel behaviour on heap overrun or stack ov
- How does same-domain policy apply to background sc
相关文章
- 关于SignalR的问题
- MVC 中如何连接websocket?
- websocket 每60s报WsHttpUpgradeHandler.timeoutAsync
- websocket定时报超时错误
- Observatory server failed to start - Fails to crea
- What happens to dynamic allocated memory when call
- Angular CLI: Proxy websocket with proxy.conf.json
- Streaming music synchronously from a mp3 file via
It may be because a file descriptor value is an index into a file descriptor table. Therefore, the number of possible file descriptors would determine the size of the table. Average users would not want half of their ram being used up by a file descriptor table that can handle millions of file descriptors that they will never need.
On unix systems, the process creation fork() and fork()/exec() idiom requires iterating over all potential process file descriptors attempting to close each one, typically leaving leaving only a few file descriptors such as stdin, stdout, stderr untouched or redirected to somewhere else.
Since this is the unix api for launching a process, it has to be done anytime a new process is created, including executing each and every non built-in command invoked within shell scripts.
Other factors to consider are that while some software may use
sysconf(OPEN_MAX)
to dynamically determine the number of files that may be open by a process, a lot of software still uses the C library's defaultFD_SETSIZE
, which is typically 1024 descriptors and as such can never have more than that many files open regardless of any administratively defined higher limit.Unix has a legacy asynchronous I/O mechanism based on file descriptor sets which use bit offsets to represent files to wait on and files that are ready or in an exception condition. It doesn't scale well for thousands of files as these descriptor sets need to be setup and cleared each time around the runloop. Newer non standard apis have appeared on the major unix variants including
kqueue()
on *BSD andepoll()
on Linux to address performance shortcomings when dealing with a large number of descriptors.It is important to note that
select()/poll()
is still used by A LOT of software as for a long time it has been the POSIX api for asynchronous I/O. The modern POSIX asynchronous IO approach is nowaio_*
API but it is likely not competitve withkqueue()
orepoll()
API's. I haven't used aio in anger and it certainly wouldn't have the performance and semantics offered by native approaches in the way they can aggregate multiple events for higher performance. kqueue() on *BSD has really good edge triggered semantics for event notification allowing it to replace select()/poll() without forcing large structural changes to your application. Linux epoll() follows the lead of *BSD kqueue() and improves upon it which in turn followed lead of Sun/Solaris evports.The upshot is that increasing the number of allowed open files across the system adds both time and space overhead for every process in the system even if they can't make use of those descriptors based on the api they are using. There are also aggregate system limits as well for the number of open files allowed. This older but interesting tuning summary for 100k-200k simultaneous connections using nginx on FreeBSD provides some insight into the overheads for maintaining open connections and another one covering a wider range of systems but "only" seeing 10K connections as the Mt Everest.
Probably the best reference for unix system programing is W. Richard Stevens Advanced Programming in the Unix Environment
For performance purposes, the open file table needs to be statically allocated, so its size needs to be fixed. File descriptors are just offsets into this table, so all the entries need to be contiguous. You can resize the table, but this requires halting all threads in the process and allocating a new block of memory for the file table, then copying all entries from the old table to the new one. It's not something you want to do dynamically, especially when the reason you're doing it is because the old table is full!
There are certain operations which slow down when you have lots of potential file descriptors. One example is the operation "close all file descriptors except
stdin
,stdout
, andstderr
" -- the only portable* way to do this is to attempt to close every possible file descriptor except those three, which can become a slow operation if you could potentially have millions of file descriptors open.*: If you're willing to be non-portable, you cna look in
/proc/self/fd
-- but that's besides the point.This isn't a particularly good reason, but it is a reason. Another reason is simply to keep a buggy program (i.e, one that "leaks" file descriptors) from consuming too much system resources.