A customer of mine has a Windows application where there is a network connection between two machines. The system is supposed to cope with the connection being lost. It does this by keeping a counter on the client position which is reset every time data is received from the server. If the counter reaches 60 seconds (i.e. we haven't heard from the server for 60 seconds) it performs some expected action to cope with the connection being lost.
The customer has a problem, however, where sometimes the connection will be lost but the client doesn't perform the expected action. Upon investigation, it appears that this is an intermittent problem caused by the client's socket to the server sometimes raising error 10057 (WSAENOTCONN / "Socket is not connected") when the connection is lost. Because the client behaves differently when it gets a socket error the customer doesn't get the desired behaviour when they get this socket error. This is not difficult for me to fix, but I am a bit puzzled by the different behaviour.
To reproduce the problem I'm physically pulling the network cable out of the back of my server machine. The majority of the time, the effect on the client side is that we just don't get any data over the socket, and we don't get an error. Some fraction of the time however error 10057 is raised. Can anyone shed any light on why there is this inconsistency? The client socket is a nonblocking STREAM socket.