A customer of mine has a Windows application where there is a network connection between two machines. The system is supposed to cope with the connection being lost. It does this by keeping a counter on the client position which is reset every time data is received from the server. If the counter reaches 60 seconds (i.e. we haven't heard from the server for 60 seconds) it performs some expected action to cope with the connection being lost.
The customer has a problem, however, where sometimes the connection will be lost but the client doesn't perform the expected action. Upon investigation, it appears that this is an intermittent problem caused by the client's socket to the server sometimes raising error 10057 (WSAENOTCONN / "Socket is not connected") when the connection is lost. Because the client behaves differently when it gets a socket error the customer doesn't get the desired behaviour when they get this socket error. This is not difficult for me to fix, but I am a bit puzzled by the different behaviour.
To reproduce the problem I'm physically pulling the network cable out of the back of my server machine. The majority of the time, the effect on the client side is that we just don't get any data over the socket, and we don't get an error. Some fraction of the time however error 10057 is raised. Can anyone shed any light on why there is this inconsistency? The client socket is a nonblocking STREAM socket.
WSAENOTCONN
is a bug in your application. It isn't a result of a lost connection. The result of a lost connection isWSAECONNRESET.
Your code must have gotWSAECONNRESET,
and then proceed to use the connection as though it was still valid. Then you getWSAENOTCONN.
I would expect you would get an error only if you try to send something. That is when the TCP connection would discover it can't reach the other end point. This will take a variable amount of time to discover the failure, depending on the network round trip time. There might be a "keep alive" option, that forces the socket to periodically send something to detect failure even when app is idle.