I seem to be having a problem with my sockets. Below, you will see some code which forks a server and a client. The server opens a TCP socket, and the client connects to it and then closes it. Sleeps are used to coordinate the timing. After the client-side close(), the server tries to write() to its own end of the TCP connection. According to the write(2) man page, this should give me a SIGPIPE and an EPIPE errno. However, I don't see this. From the server's point of view, the write to a local, closed socket succeeds, and absent the EPIPE I can't see how the server should be detecting that the client has closed the socket.
In the gap between the client closing its end and the server attempting to write, a call to netstat will show that the connection is in a CLOSE_WAIT/FIN_WAIT2 state, so the server end should definitely be able to reject the write.
For reference, I'm on Debian Squeeze, uname -r is 2.6.39-bpo.2-amd64.
What's going on here?
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/socket.h>
#include <sys/select.h>
#include <netinet/tcp.h>
#include <errno.h>
#include <string.h>
#include <stdlib.h>
#include <fcntl.h>
#include <netdb.h>
#define SERVER_ADDRESS "127.0.0.7"
#define SERVER_PORT 4777
#define myfail_if( test, msg ) do { if((test)){ fprintf(stderr, msg "\n"); exit(1); } } while (0)
#define myfail_unless( test, msg ) myfail_if( !(test), msg )
int connect_client( char *addr, int actual_port )
{
int client_fd;
struct addrinfo hint;
struct addrinfo *ailist, *aip;
memset( &hint, '\0', sizeof( struct addrinfo ) );
hint.ai_socktype = SOCK_STREAM;
myfail_if( getaddrinfo( addr, NULL, &hint, &ailist ) != 0, "getaddrinfo failed." );
int connected = 0;
for( aip = ailist; aip; aip = aip->ai_next ) {
((struct sockaddr_in *)aip->ai_addr)->sin_port = htons( actual_port );
client_fd = socket( aip->ai_family, aip->ai_socktype, aip->ai_protocol );
if( client_fd == -1) { continue; }
if( connect( client_fd, aip->ai_addr, aip->ai_addrlen) == 0 ) {
connected = 1;
break;
}
close( client_fd );
}
freeaddrinfo( ailist );
myfail_unless( connected, "Didn't connect." );
return client_fd;
}
void client(){
sleep(1);
int client_fd = connect_client( SERVER_ADDRESS, SERVER_PORT );
printf("Client closing its fd... ");
myfail_unless( 0 == close( client_fd ), "close failed" );
fprintf(stdout, "Client exiting.\n");
exit(0);
}
int init_server( struct sockaddr * saddr, socklen_t saddr_len )
{
int sock_fd;
sock_fd = socket( saddr->sa_family, SOCK_STREAM, 0 );
if ( sock_fd < 0 ){
return sock_fd;
}
myfail_unless( bind( sock_fd, saddr, saddr_len ) == 0, "Failed to bind." );
return sock_fd;
}
int start_server( const char * addr, int port )
{
struct addrinfo *ailist, *aip;
struct addrinfo hint;
int sock_fd;
memset( &hint, '\0', sizeof( struct addrinfo ) );
hint.ai_socktype = SOCK_STREAM;
myfail_if( getaddrinfo( addr, NULL, &hint, &ailist ) != 0, "getaddrinfo failed." );
for( aip = ailist; aip; aip = aip->ai_next ){
((struct sockaddr_in *)aip->ai_addr)->sin_port = htons( port );
sock_fd = init_server( aip->ai_addr, aip->ai_addrlen );
if ( sock_fd > 0 ){
break;
}
}
freeaddrinfo( aip );
myfail_unless( listen( sock_fd, 2 ) == 0, "Failed to listen" );
return sock_fd;
}
int server_accept( int server_fd )
{
printf("Accepting\n");
int client_fd = accept( server_fd, NULL, NULL );
myfail_unless( client_fd > 0, "Failed to accept" );
return client_fd;
}
void server() {
int server_fd = start_server(SERVER_ADDRESS, SERVER_PORT);
int client_fd = server_accept( server_fd );
printf("Server sleeping\n");
sleep(60);
printf( "Errno before: %s\n", strerror( errno ) );
printf( "Write result: %d\n", write( client_fd, "123", 3 ) );
printf( "Errno after: %s\n", strerror( errno ) );
close( client_fd );
}
int main(void){
pid_t clientpid;
pid_t serverpid;
clientpid = fork();
if ( clientpid == 0 ) {
client();
} else {
serverpid = fork();
if ( serverpid == 0 ) {
server();
}
else {
int clientstatus;
int serverstatus;
waitpid( clientpid, &clientstatus, 0 );
waitpid( serverpid, &serverstatus, 0 );
printf( "Client status is %d, server status is %d\n",
clientstatus, serverstatus );
}
}
return 0;
}
This is what the Linux man page says about
write
andEPIPE
:When Linux is using a
pipe
or asocketpair
, it can and will check the reading end of the pair, as these two programs would demonstrate:Linux is able to do so, because the kernel has innate knowledge about the other end of the pipe or connected pair. However, when using
connect
, the state about the socket is maintained by the protocol stack. Your test demonstrates this behavior, but below is a program that does it all in a single thread, similar to the two tests above:If you run the above program, you will get output similar to this:
This shows it took one
write
for the sockets to transition to theCLOSED
states. To find out why this occurred, a TCP dump of the transaction can be useful:The first three lines represent the 3-way handshake. The fourth line is the
FIN
packet the client sends to the server, and the fifth line is theACK
from the server, acknowledging receipt. The sixth line is the server trying to send 1 byte of data to the client with thePUSH
flag set. The final line is the clientRESET
packet, which causes the TCP state for the connection to be freed, and is why the thirdnetstat
command did not result in any output in the test above.So, the server doesn't know the client will reset the connection until after it tries to send some data to it. The reason for the reset is because the client called
close
, instead of something else.The server cannot know for certain what system call the client has actually issued, it can only follow the TCP state. For example, we could replace the
close
call with a call toshutdown
instead.The difference between
shutdown
andclose
is thatshutdown
only governs the state of the connection, whileclose
also governs the state of the file descriptor that represents the socket. Ashutdown
will notclose
a socket.The output will be different with the
shutdown
change:The TCP dump will show also show something different:
Notice the reset at the end comes 5 seconds after the last
ACK
packet. This reset is due to the program shutting down without properly closing the sockets. It is theACK
packet from the client to the server before the reset that is different than before. This is the indication that the client did not useclose
. In TCP, theFIN
indication is really an indication that there is no more data to be sent. But since a TCP connection is bi-directional, the server that receives theFIN
assumes the client can still receive data. In the case above, the client in fact does accept the data.Whether the client uses
close
orSHUT_WR
to issue aFIN
, in either case you can detect the arrival of theFIN
by polling on the server socket for a readable event. If after callingread
the result is0
, then you know theFIN
has arrived, and you can do what you wish with that information.Now, it is trivially true that if the server issues
SHUT_WR
withshutdown
before it tries to do a write, it will in fact get theEPIPE
error.If, instead, you want the client to indicate an immediate reset to the server, you can force that to happen on most TCP stacks by enabling the linger option, with a linger timeout of
0
prior to callingclose
.With the above change, the output of the program becomes:
The
send
gets an immediate error in this case, but it is notEPIPE
, it isECONNRESET
. The TCP dump reflects this as well:The
RESET
packet comes right after the 3-way handshake completes. However, using this option has its dangers. If the other end has unread data in the socket buffer when theRESET
arrives, that data will be purged, causing the data to be lost. Forcing aRESET
to be sent is usually used in request/response style protocols. The sender of the request can know there can be no data lost when it receives the entire response to its request. Then, it is safe for the request sender to force aRESET
to be sent on the connection.You have two sockets - one for the client and another for the server. Now your client is doing the active close.This means TCP's conection termination has been started by the client ( A tcp FIN segment has been sent from the client send).
At this stage you see the client socket in FIN_WAIT1 state. Now what is the state of the server socket now? It is in CLOSE_WAIT state.So the server socket is not closed.
The FIN from the server has not been sent yet. (Why - since the application has not closed the socket). At this stage you are writing over the server socket so you are not getting an error.
Now if you want to see the error just write close(client_fd) before writing over the socket.
Here the server socket is no more in CLOSE_WAIT state so you can see return value of write is -ve to indicate the error. I hope this clarifies.
After having called
write()
one (first) time (as coded in your example) after the clientclose()
ed the socket, you'll be getting the expectedEPIPE
andSIGPIPE
on any successive call to write().Just try adding another write() to provoke the error:
The output will be:
The output of the last two
printf()
s is missing as the process terminates due toSIGPIPE
being raised by the second call towrite()
. To avoid the termination of the process, you might like to make the process ignoreSIGPIPE
.I suspect that what's happening is the server side socket is still valid so your write call is making a valid attempt at writing to your file descriptor even though your TCP session is in a closed state. If I am completely wrong let me know.
I guess that you're running into the TCP stack detecting a failed send and attempting retransmission. Do subsequent calls to
write()
fail silently? In other words, try writing five times to the closed socket and see if you eventually get a SIGPIPE. And when you say the write 'succeeds', do you get a return result of 3?