Socket still listening after application crash

2020-07-22 19:52发布

问题:

I'm having a problem with one of my C++ applications on Windows 2008x64 (same app runs just fine on Windows 2003x64).

After a crash or even sometimes after a regular shutdown/restart cycle it has a problem using a socket on port 82 it needs to receive commands.

Looking at netstat I see the socket is still in listening state more than 10 minutes after the application stopped (the process is definitely not running anymore).

  TCP    0.0.0.0:82             LISTENING

I tried setting the socket option to REUSEADDR but as far as I know that only affects re-connecting to a port that's in TIME_WAIT state. Either way this change didn't seem to make any difference.

int doReuse = 1;
setsockopt(listenFd, SOL_SOCKET, SO_REUSEADDR,
           (const char *)&doReuse, sizeof(doReuse)); 

Any ideas what I can do to solve or at least avoid this problem?

EDIT:

Did netstat -an but this is all I am getting:

  TCP    0.0.0.0:82             0.0.0.0:0              LISTENING

For netstat -anb I get:

  TCP    0.0.0.0:82             0.0.0.0:0              LISTENING
 [System]

I'm aware of shutting down gracefully, but even if the app crashes for some reason I still need to be able to restart it. The application in question uses an in-house library that internally uses Windows Sockets API.

EDIT:

Apparently there is no solution for this problem, so for development I will go with a proxy / tool to work around it. Thanks for all the suggestions, much appreciated.

回答1:

If this is only hurting you at debug time, use tcpview from the sysinternals folks to force the socket closed. I am assuming it works on your platform, but I am not sure.

If you're doing blocking operations on any sockets, do not use an indefinite timeout. This can cause weird behavior on a multiprocessor machine in my experience. I'm not sure what Windows server OS it was, but, it was one or two versions previous to 2003 Server. Instead of an indefinite timeout, use a 30 to 60 second timeout and then just repeat the wait. This goes for overlapped IO and IOCompletion ports as well, if you're using them.

If this is an app you're shipping for others to use, good luck. Windows can be a pure bastard when using sockets...



回答2:

I tried setting the socket option to REUSEADDR but as far as I know that only affects re-connecting to a port that's in TIME_WAIT state.

That's not quite correct. It will let you re-use a port in TIME_WAIT state for any purpose, i.e. listen or connect. But I agree it won't help with this. I'm surprised by the comment about the OS taking 10 minutes to detect the crashed listener. It should clean up all resources as soon as the process ends, other than ports in the TIME_WAIT state.



回答3:

The first thing to check is that it really is your application listening on that port. Use:

netstat -anb

to figure out which process is listenin on that port.

The second thing to check is that your are closing the socket gracefully when your application shuts down. If you're using a high-level socket API that shouldn't be too much of an issue (you are using a socket API, right?).

Finally, how is your application structured? Is it threaded? Does it launch other processes? How do you know that your application is really shut down?



回答4:

Run

netstat -ano

This will give you the PID of the process that has the port open. Check that process from the task manager. Make sure you have "list processes from all users" is checked.



回答5:

http://hea-www.harvard.edu/~fine/Tech/addrinuse.html is a great resource for "Bind: Address Already in Use" errors.

Some extracts:

TIME_WAIT is the state that typically ties up the port for several minutes after the process has completed. The length of the associated timeout varies on different operating systems, and may be dynamic on some operating systems, however typical values are in the range of one to four minutes.

Strategies for Avoidance

SO_REUSEADDR

This is the both the simplest and the most effective option for reducing the "address already in use" error.

Client Closes First

TIME_WAIT can be avoided if the remote end initiates the closure. So the server can avoid problems by letting the client close first.

Reduce Timeout

If (for whatever reason) neither of these options works for you, it may also be possible to shorten the timeout associated with TIME_WAIT.



回答6:

After seeing https://superuser.com/a/453827/56937 I discovered that there was a WerFault process that was suspended.

It must have inherited the sockets from the non-existent process because killing it freed up my listening ports.