I'm having a problem with one of my C++ applications on Windows 2008x64 (same app runs just fine on Windows 2003x64).
After a crash or even sometimes after a regular shutdown/restart cycle it has a problem using a socket on port 82 it needs to receive commands.
Looking at netstat I see the socket is still in listening state more than 10 minutes after the application stopped (the process is definitely not running anymore).
TCP 0.0.0.0:82 LISTENING
I tried setting the socket option to REUSEADDR but as far as I know that only affects re-connecting to a port that's in TIME_WAIT state. Either way this change didn't seem to make any difference.
int doReuse = 1;
setsockopt(listenFd, SOL_SOCKET, SO_REUSEADDR,
(const char *)&doReuse, sizeof(doReuse));
Any ideas what I can do to solve or at least avoid this problem?
EDIT:
Did netstat -an but this is all I am getting:
TCP 0.0.0.0:82 0.0.0.0:0 LISTENING
For netstat -anb I get:
TCP 0.0.0.0:82 0.0.0.0:0 LISTENING
[System]
I'm aware of shutting down gracefully, but even if the app crashes for some reason I still need to be able to restart it. The application in question uses an in-house library that internally uses Windows Sockets API.
EDIT:
Apparently there is no solution for this problem, so for development I will go with a proxy / tool to work around it. Thanks for all the suggestions, much appreciated.
If this is only hurting you at debug time, use tcpview from the sysinternals folks to force the socket closed. I am assuming it works on your platform, but I am not sure.
If you're doing blocking operations on any sockets, do not use an indefinite timeout. This can cause weird behavior on a multiprocessor machine in my experience. I'm not sure what Windows server OS it was, but, it was one or two versions previous to 2003 Server.
Instead of an indefinite timeout, use a 30 to 60 second timeout and then just repeat the wait. This goes for overlapped IO and IOCompletion ports as well, if you're using them.
If this is an app you're shipping for others to use, good luck. Windows can be a pure bastard when using sockets...
I tried setting the socket option to
REUSEADDR but as far as I know that
only affects re-connecting to a port
that's in TIME_WAIT state.
That's not quite correct. It will let you re-use a port in TIME_WAIT state for any purpose, i.e. listen or connect. But I agree it won't help with this. I'm surprised by the comment about the OS taking 10 minutes to detect the crashed listener. It should clean up all resources as soon as the process ends, other than ports in the TIME_WAIT state.
The first thing to check is that it really is your application listening on that port. Use:
netstat -anb
to figure out which process is listenin on that port.
The second thing to check is that your are closing the socket gracefully when your application shuts down. If you're using a high-level socket API that shouldn't be too much of an issue (you are using a socket API, right?).
Finally, how is your application structured? Is it threaded? Does it launch other processes? How do you know that your application is really shut down?
Run
netstat -ano
This will give you the PID of the process that has the port open. Check that process from the task manager. Make sure you have "list processes from all users" is checked.
http://hea-www.harvard.edu/~fine/Tech/addrinuse.html is a great resource for "Bind: Address Already in Use" errors.
Some extracts:
TIME_WAIT is the state that typically ties up the port for several minutes after the process has completed. The length of the associated timeout varies on different operating systems, and may be dynamic on some operating systems, however typical values are in the range of one to four minutes.
Strategies for Avoidance
SO_REUSEADDR
This is the both the simplest and the most effective option for reducing the "address already in use" error.
Client Closes First
TIME_WAIT can be avoided if the remote end initiates the closure. So the server can avoid problems by letting the client close first.
Reduce Timeout
If (for whatever reason) neither of these options works for you, it may also be possible to shorten the timeout associated with TIME_WAIT.
After seeing https://superuser.com/a/453827/56937 I discovered that there was a WerFault
process that was suspended.
It must have inherited the sockets from the non-existent process because killing it freed up my listening ports.