Within an infinite loop, I am listening 100+ file descriptors using select. If fd has some packets ready to be read, I notify the packet processor thread assigned to this file descriptor and I don't set the bit for this file descriptor for the next round until I receive a notification from data processor thread saying it is done. I wonder how inefficient my code would be if I won't calculate the max. fd for select everytime I clear/set a file descriptor from the set. I am expecting file descriptors to be nearly contiguous, data arrival rate to be a few thousands bytes every second for each fd.
问题:
回答1:
You should really use poll
instead of select
. Both are standard, but poll
is easier to use, does not place a limit on the number of file descriptors you can check (whereas select
limits you to the compile-time constant FD_SETSIZE
), and more efficient. If you do use select
, you can always pass FD_SETSIZE
for the first argument, but this will of course give worst-case performance since the kernel has to scan the whole fd_set
; passing the actual max+1 allows a shorter search, but still not as efficient as the array passed to poll
.
For what it's worth, these days it seems stylish to use the nonstandard Linux epoll
or whatever the BSD equivalent is. These interfaces may have some advantages if you have a huge number (on the order of tens of thousands) of long-lived (at least several round trips) connections, but otherwise performance will not be noticably better (and, at the lower end, may be worse), and these interfaces are of course non-portable, and in my opinion, harder to use correctly than the plain, portable poll
.
回答2:
It is in principle important to give a good max fd to select
(but with only a few hundreds of file descriptors in your process that does not matter much).
But select
is becoming obsolete (precisely because of the max fd, so the kernel will take O(m) time where m is the max.fd; so select
could be costly if using it on a small set of file descriptors whose max m is large). Use poll(2) instead (which, when given a set of n file descriptors takes O(n) time, independently of the maximal file descriptor m).
Current Linux systems and processes might have many dozens of thousands of file descriptors. Read about the C10K problem.
And you might have some event loop, e.g. use libraries like libevent or libev (which might use ̀poll
internally, and may use more operating system specific things like epoll
etc... abstracting them in a convenient interface)