I created a simple application to accept IPv4 TCP connections using select() and accept().
I use a python script to test this. It opens 100 connection in sequence. ie:
for i in range(100):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
print s.connect((IP, PORT))
s.send("Test\r\n")
What I observe is that my application gets stuck in select() for 2 seconds after the first X connections.
Output from strace:
1344391414.452208 select(30, [3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29], NULL, NULL, NULL) = 1 (in [3])
1344391416.742843 accept(3, 0, NULL) = 30
My code is following. Any idea what I am doing wrong?
#include <assert.h>
#include <errno.h>
#include <netinet/in.h>
#include <sys/socket.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <syslog.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/select.h>
int
fd_create (void)
{
int fd;
int set = true;
struct sockaddr_in addr;
fd = socket(AF_INET, SOCK_STREAM, 0);
setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, &set, sizeof(set));
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(1999);
addr.sin_addr.s_addr = INADDR_ANY;
bind(fd, (struct sockaddr *)&addr, sizeof(addr));
listen(fd, 1024);
return (fd);
}
int
fd_echo (int fd)
{
int n;
char buffer[128 + 1];
while ((n = recv(fd, buffer, 128, 0)) > 0);
return (n);
}
int
main (void)
{
int listen_fd;
fd_set working;
fd_set master;
int max_fd;
int i;
int new_fd;
int rc;
int con;
FD_ZERO(&master);
listen_fd = fd_create();
fcntl(listen_fd, F_SETFL, fcntl(listen_fd, F_GETFL) | O_NONBLOCK);
max_fd = listen_fd;
printf("%d\n", listen_fd);
FD_SET(listen_fd, &master);
con = 0;
for (;;) {
memcpy(&working, &master, sizeof(fd_set));
select(max_fd + 1, &working, NULL, NULL, NULL);
for (i = 0; i <= max_fd; i++) {
if (FD_ISSET(i, &working)) {
if (i == listen_fd) {
while ((new_fd = accept(i, NULL, NULL)) >= 0) {
fcntl(new_fd, F_SETFL, fcntl(new_fd, F_GETFL) | O_NONBLOCK);
FD_SET(new_fd, &master);
if (max_fd < new_fd) {
max_fd = new_fd;
}
printf("New connection %d (%d)\n", new_fd, ++con);
}
if ((new_fd == -1) && (errno != EAGAIN && errno != EWOULDBLOCK)) {
return(0);
}
} else {
rc = fd_echo(i);
if ((rc == 0) ||
((rc == -1) && ((errno != EAGAIN && errno != EWOULDBLOCK)))) {
close(i);
FD_CLR(i, &master);
}
}
}
}
}
return (0);
}
UPDATE/WARNING: while trying to prove this answer applies, I found that maybe it doesn't. I ran the test and got delays without max_fd ever getting higher than 300. And I got delays with poll() too. So I tried tcpdump and there were retransmissions. It looks like even 127.0.0.1 can drop packets when you throw them at it this fast. Leaving the answer here because it is a real issue, even if it's not the most pressing one.
So this involves a lot of file descriptors, and it works with poll but not select. With those clues I can see the explanation: you've gone over the FD_SETSIZE
limit.
The official pronouncement from POSIX is (referring to FD_ZERO
/FD_SET
/FD_CLR
/FD_ISSET
):
The behavior of these macros is undefined if the fd argument is less than 0 or greater than or equal to FD_SETSIZE, or if fd is not a valid file descriptor, or if any of the arguments are expressions with side-effects.
(from
http://pubs.opengroup.org/onlinepubs/9699919799/functions/select.html)
To really understand what happened you have to look deeper than the official specification into the actual implementation of the fd_set
type. It has a split personality. In the kernel, where select
is implemented, it's treated as a variable-length array of bits. The first argument to select
is used to decide where the array ends. If you call select(2048, ...)
the kernel will expect each non-NULL fd_set *
to point to an array of 256 bytes (2048 bits).
But in userspace, fd_set
is a fixed-size struct. The size is FD_SETSIZE
bits, which is 1024 on my system and probably yours too. FD_SET
and the other macros are basically just doing assignments to elements of the array, only they're a little more complicated because they have to deal with the conceptual array elements being the individual bits. So if one of your file descriptors is 1024 and you try to FD_SET
it, you've done the equivalent of
int array[1024];
array[1024] = 1;
In other words, you clobbered whatever was in memory after the fd_set
, causing weird things to happen later.
There are ways around this. I've seen old code that does a #define FD_SETSIZE somebignumber
before including the header that defines fd_set
. I don't know what OSes that worked on; I just tried it and glibc seems to ignore it.
A better possibility is to do something like the old "struct hack" where you'd allocate a struct with more memory than its sizeof
, and the extra memory would be usable as extra elements in the array that was the last member of the struct.
fd_set *rfds = malloc(128+sizeof *foo); /* can hold fds up to FD_SETSIZE+128*8-1 */
Now of course you need to remember to free it when you're done with it, and pass rfds
instead of &rfds
to select
and the FD_*
macros, and do your own memset
instead of FD_ZERO
, and hope that the kernel implementation doesn't change since you're now real chummy with it. But it works... for now.
Using poll is actually probably the correct answer.
So further debugging of the kernel...
The packet gets dropped in "tcp_v4_syn_recv_sock()" because "sk_acceptq_is_full(sk)" returns true.
"sk->sk_ack_backlog" is 11 and the configured "sk->sk_max_ack_backlog" is 10. (We set this in the listen() command.)
(Updating based on EJP notes.)
So I guess what is happening is:
Client blocks on connect(). A SYN is sent to the server. Kernel gets the SYN sends SYN/ACK. Client gets back the SYN/ACK and unblocks and a) sends the ACK and b) a new SYN/ACK.
Server receives the ACK and puts the connection in the backlog.
Do that 10 times and we are stuck.
Good one guys. Wouldn't have understand it without your help. Thanks!!