I have a multi-threaded server (thread pool) that is handling a large number of requests (up to 500/sec for one node), using 20 threads. There's a listener thread that accepts incoming connections and queues them for the handler threads to process. Once the response is ready, the threads then write out to the client and close the socket. All seemed to be fine until recently, a test client program started hanging randomly after reading the response. After a lot of digging, it seems that the close() from the server is not actually disconnecting the socket. I've added some debugging prints to the code with the file descriptor number and I get this type of output.
Processing request for 21
Writing to 21
Closing 21
The return value of close() is 0, or there would be another debug statement printed. After this output with a client that hangs, lsof is showing an established connection.
SERVER 8160 root 21u IPv4 32754237 TCP localhost:9980->localhost:47530 (ESTABLISHED)
CLIENT 17747 root 12u IPv4 32754228 TCP localhost:47530->localhost:9980 (ESTABLISHED)
It's as if the server never sends the shutdown sequence to the client, and this state hangs until the client is killed, leaving the server side in a close wait state
SERVER 8160 root 21u IPv4 32754237 TCP localhost:9980->localhost:47530 (CLOSE_WAIT)
Also if the client has a timeout specified, it will timeout instead of hanging. I can also manually run
call close(21)
in the server from gdb, and the client will then disconnect. This happens maybe once in 50,000 requests, but might not happen for extended periods.
Linux version: 2.6.21.7-2.fc8xen Centos version: 5.4 (Final)
socket actions are as follows
SERVER:
int client_socket; struct sockaddr_in client_addr; socklen_t client_len = sizeof(client_addr);
while(true) {
client_socket = accept(incoming_socket, (struct sockaddr *)&client_addr, &client_len);
if (client_socket == -1)
continue;
/* insert into queue here for threads to process */
}
Then the thread picks up the socket and builds the response.
/* get client_socket from queue */
/* processing request here */
/* now set to blocking for write; was previously set to non-blocking for reading */
int flags = fcntl(client_socket, F_GETFL);
if (flags < 0)
abort();
if (fcntl(client_socket, F_SETFL, flags|O_NONBLOCK) < 0)
abort();
server_write(client_socket, response_buf, response_length);
server_close(client_socket);
server_write and server_close.
void server_write( int fd, char const *buf, ssize_t len ) {
printf("Writing to %d\n", fd);
while(len > 0) {
ssize_t n = write(fd, buf, len);
if(n <= 0)
return;// I don't really care what error happened, we'll just drop the connection
len -= n;
buf += n;
}
}
void server_close( int fd ) {
for(uint32_t i=0; i<10; i++) {
int n = close(fd);
if(!n) {//closed successfully
return;
}
usleep(100);
}
printf("Close failed for %d\n", fd);
}
CLIENT:
Client side is using libcurl v 7.27.0
CURL *curl = curl_easy_init();
CURLcode res;
curl_easy_setopt( curl, CURLOPT_URL, url);
curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, write_callback );
curl_easy_setopt( curl, CURLOPT_WRITEDATA, write_tag );
res = curl_easy_perform(curl);
Nothing fancy, just a basic curl connection. Client hangs in tranfer.c (in libcurl) because the socket is not perceived as being closed. It's waiting for more data from the server.
Things I've tried so far:
Shutdown before close
shutdown(fd, SHUT_WR);
char buf[64];
while(read(fd, buf, 64) > 0);
/* then close */
Setting SO_LINGER to close forcibly in 1 second
struct linger l;
l.l_onoff = 1;
l.l_linger = 1;
if (setsockopt(client_socket, SOL_SOCKET, SO_LINGER, &l, sizeof(l)) == -1)
abort();
These have made no difference. Any ideas would be greatly appreciated.
EDIT -- This ended up being a thread-safety issue inside a queue library causing the socket to be handled inappropriately by multiple threads.
This sounds to me like a bug in your Linux distribution.
The GNU C library documentation says:
Nothing about clearing any error flags or waiting for the data to be flushed or any such thing.
Your code is fine; your O/S has a bug.
Here is some code I've used on many Unix-like systems (e.g SunOS 4, SGI IRIX, HPUX 10.20, CentOS 5, Cygwin) to close a socket:
But the above does not guarantee that any buffered writes are sent.
Graceful close: It took me about 10 years to figure out how to close a socket. But for another 10 years I just lazily called
usleep(20000)
for a slight delay to 'ensure' that the write buffer was flushed before the close. This obviously is not very clever, because:usleep()
(but I usually calledusleep()
twice to handle this case--a hack).But doing a proper flush is surprisingly hard. Using
SO_LINGER
is apparently not the way to go; see for example:And
SIOCOUTQ
appears to be Linux-specific.Note
shutdown(fd, SHUT_WR)
doesn't stop writing, contrary to its name, and maybe contrary toman 2 shutdown
.This code
flushSocketBeforeClose()
waits until a read of zero bytes, or until the timer expires. The functionhaveInput()
is a simple wrapper for select(2), and is set to block for up to 1/100th of a second.Example of use:
In the above, my
getWallTimeEpoch()
is similar totime(),
andPerror()
is a wrapper forperror().
Edit: Some comments:
My first admission is a bit embarrassing. The OP and Nemo challenged the need to clear the internal
so_error
before close, but I cannot now find any reference for this. The system in question was HPUX 10.20. After a failedconnect()
, just callingclose()
did not release the file descriptor, because the system wished to deliver an outstanding error to me. But I, like most people, never bothered to check the return value ofclose.
So I eventually ran out of file descriptors(ulimit -n),
which finally got my attention.(very minor point) One commentator objected to the hard-coded numerical arguments to
shutdown()
, rather than e.g. SHUT_WR for 1. The simplest answer is that Windows uses different #defines/enums e.g.SD_SEND
. And many other writers (e.g. Beej) use constants, as do many legacy systems.Also, I always, always, set FD_CLOEXEC on all my sockets, since in my applications I never want them passed to a child and, more importantly, I don't want a hung child to impact me.
Sample code to set CLOEXEC:
Great answer from Joseph Quinsey. I have comments on the
haveInput
function. Wondering how likely it is that select returns an fd you did not include in your set. This would be a major OS bug IMHO. That's the kind of thing I would check if I wrote unit tests for theselect
function, not in an ordinary app.My other comment pertains to the handling of EINTR. In theory, you could get stuck in an infinite loop if
select
kept returning EINTR, as this error lets the loop start over. Given the very short timeout (0.01), it appears highly unlikely to happen. However, I think the appropriate way of dealing with this would be to return errors to the caller (flushSocketBeforeClose
). The caller can keep callinghaveInput
has long as its timeout hasn't expired, and declare failure for other errors.ADDITION #1
flushSocketBeforeClose
will not exit quickly in case ofread
returning an error. It will keep looping until the timeout expires. You can't rely on theselect
insidehaveInput
to anticipate all errors.read
has errors of its own (ex:EIO
).