I have in my application a failure that arose which does not seem to be reproducible. I have a TCP socket connection which failed and the application tried to reconnect it. In the second call to connect() attempting to reconnect, I got an error result with errno == EADDRNOTAVAIL which the man page for connect() says means: "The specified address is not available from the local machine."
Looking at the call to connect(), the second argument appears to be the address to which the error is referring to, but as I understand it, this argument is the TCP socket address of the remote host, so I am confused about the man page referring to the local machine. Is it that this address to the remote TCP socket host is not available from my local machine? If so, why would this be? It had to have succeeded calling connect() the first time before the connection failed and it attempted to reconnect and got this error. The arguments to connect() were the same both times.
Would this error be a transient one which, if I had tried calling connect again might have gone away if I waited long enough? If not, how should I try to recover from this failure?
Check this link
http://www.toptip.ca/2010/02/linux-eaddrnotavail-address-not.html
EDIT: Yes I meant to add more but had to cut it there because of an emergency
Did you close the socket before attempting to reconnect? Closing will tell the system that the socketpair (ip/port) is now free.
Here are additional items too look at:
Link with a bug similar to yours (answer is close to the bottom)
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4294599
It seems that your socket is basically stuck in one of the TCP internal states and that adding a delay for reconnection might solve your problem as they seem to have done in that bug report.
Another thing to check is that the interface is up. I got confused by this one recently while using network namespaces, since it seems creating a new network namespace produces an entirely independent loopback interface but doesn't bring it up (at least, with Debian wheezy's versions of things). This escaped me for a while since one doesn't typically think of loopback as ever being down.
This can also happen if an invalid port is given, like 0.
If you are unwilling to change the number of temporary ports available (as suggested by David), or you need more connections than the theoretical maximum, there are two other methods to reduce the number of ports in use. However, they are to various degrees violations of the TCP standard, so they should be used with care.
The first is to turn on
SO_LINGER
with a zero-second timeout, forcing theTCP
stack to send a RST packet and flush the connection state. There is one subtlety, however: you should callshutdown
on the socket file descriptor before youclose
, so that you have a chance to send aFIN
packet before theRST
packet. So the code will look something like:The server should only see a premature connection reset if the
FIN
packet gets reordered with theRST
packet.See TCP option SO_LINGER (zero) - when it's required for more details. (Experimentally, it doesn't seem to matter where you set
setsockopt
.)The second is to use
SO_REUSEADDR
and an explicitbind
(even if you're the client), which will allow Linux to reuse temporary ports when you run, before they are done waiting. Note that you must usebind
withINADDR_ANY
and port0
, otherwiseSO_REUSEADDR
is not respected. Your code will look something like:This option is less good because you'll still saturate the internal kernel data structures for TCP connections as per
netstat -an | grep -e tcp -e udp | wc -l
. However, you won't start reusing ports until this happens.