Cannot assign requested address - possible causes?

2019-01-11 17:56发布

问题:

I have a program that consists of a master server and distributed slave servers. The slave servers send status updates to the server, and if the server hasn't heard from a specific slave in a fixed period, it marks the slave as down. This is happening consistently.

From inspecting logs, I have found that the slave is only able to send one status update to the server, and then is never able to send another update, always failing on the call to connect() "Cannot assign requested address (99).

Oddly enough, the slave is able to send several other updates to the server, and all of the connections are happening on the same port. It seems that the most common cause of this failure is that connections are left open, but I'm having trouble finding anything left open. Are there other possible explanations?

To clarify, here's how I'm connecting:

struct sockaddr *sa; // parameter
size_t           sa_size; //parameter
int              i = 1;
int              stream;

stream = socket(AF_INET,SOCK_STREAM,0);
setsockopt(stream,SOL_SOCKET,SO_REUSEADDR,&i,sizeof(i));
bindresvport(stream,NULL);
connect(stream,sa,sa_size);

This code is in a function to obtain a connection to another server, and a failure on any of those 4 calls causes the function to fail.

回答1:

Maybe SO_REUSEADDR helps here? http://www.unixguide.net/network/socketfaq/4.5.shtml



回答2:

It turns out that the problem really was that the address was busy - the busyness was caused by some other problems in how we are handling network communications. Your inputs have helped me figure this out. Thank you.

EDIT: to be specific, the problems in handling our network communications were that these status updates would be constantly re-sent if the first failed. It was only a matter of time until we had every distributed slave trying to send its status update at the same time, which was over-saturating our network.



回答3:

this is just a shot in the dark : when you call connect without a bind first, the system allocates your local port, and if you have multiple threads connecting and disconnecting it could possibly try to allocate a port already in use. the kernel source file inet_connection_sock.c hints at this condition. just as an experiment try doing a bind to a local port first, making sure each bind/connect uses a different local port number.



回答4:

sysctl -w net.ipv4.tcp_timestamps=1
sysctl -w net.ipv4.tcp_tw_recycle=1