I've encountered a strange bug with TCP sockets. It seems that SO_KEEPALIVE
is enabled on all sockets by default.
I wrote a short test case to create a socket and connect to a server. Immediately after the connect, I check SO_KEEPALIVE
with getsockopt
. The value is non-zero, which according to the MSDN, means keep alive is enabled. Maybe I'm misunderstanding this.
I recently had a strange bug where a server disconnected twice in a row. Some clients were in a state where they had sent logon information and were waiting for a response. Even though there was an overlapped WSARecv
posted to the socket connected to the server, no completion was posted to notify the client that the server crashed, so I'm assuming the socket wasn't fully closed.
Roughly 2 hours later (actually about 1 hour, 59 minutes, and 19 seconds), a completion packet was posted for the read, notifying the client that the connection is no longer open. This is where I started to suspect SO_KEEPALIVE
.
I'm trying to understand why this happened. It caused a bit of an issue because clients who lose their connection for any reason are supposed to automatically reconnect to the server; in this case, because no disconnect was notified, the client didn't reconnect until 2 hours later.
An obvious fix is to put a timeout, but I'd like to know how this situation could occur.
SO_KEEPALIVE
is not set on the socket by my application server or client.
// Error checking is removed for this snippet, but all winsock calls succeed.
int main() {
WORD wVersionRequested;
WSADATA wsaData;
int err;
wVersionRequested = MAKEWORD(2, 2);
err = WSAStartup(wVersionRequested, &wsaData);
SOCKET foo = WSASocket(AF_INET, SOCK_STREAM, IPPROTO_TCP, 0, 0, 0);
DWORD optval;
int optlen = sizeof(optval);
int test = 0;
test = getsockopt(foo, SOL_SOCKET, SO_KEEPALIVE, (char*)&optval, &optlen);
std::cout << "Returned " << optval << std::endl;
sockaddr_in clientService;
clientService.sin_family = AF_INET;
clientService.sin_addr.s_addr = inet_addr("127.0.0.1");
clientService.sin_port = htons(446);
connect(foo, (SOCKADDR*) &clientService, sizeof(clientService));
test = getsockopt(foo, SOL_SOCKET, SO_KEEPALIVE, (char*)&optval, &optlen);
std::cout << "Returned " << optval << std::endl;
std::cin.get();
return 0;
}
// Example output:
// Returned 2883584
// Returned 2883584