I'm trying to supply a timeout for connect(). I've searched around and found several articles related to this. I've coded up what I believe should work but unfortunately I get no error reported from getsockopt(). But then when I come to the write() it fails with an errno of 107 - ENOTCONN.
A couple of points. I'm running on Fedora 23. The docs for connect() says it should return failure with an errno of EINPROGRESS for a connect that is not complete yet however I was experiencing EAGAIN so I added that to my check. Currently my socket server is setting the backlog to zero in the listen() call. Many of the calls succeed but the ones that fail all fail with the 107 - ENOTCONN I had mentioned in the write() call.
I'm hoping I'm just missing something but so far can't figure out what.
int domain_socket_send(const char* socket_name, unsigned char* buffer,
unsigned int length, unsigned int timeout)
{
struct sockaddr_un addr;
int fd = -1;
int result = 0;
// Create socket.
fd = socket(AF_UNIX, SOCK_STREAM, 0);
if (fd == -1)
{
result = -1;
goto done;
}
if (timeout != 0)
{
// Enabled non-blocking.
int flags;
flags = fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags | O_NONBLOCK);
}
// Set socket name.
memset(&addr, 0, sizeof(addr));
addr.sun_family = AF_UNIX;
strncpy(addr.sun_path, socket_name, sizeof(addr.sun_path) - 1);
// Connect.
result = connect(fd, (struct sockaddr*) &addr, sizeof(addr));
if (result == -1)
{
// If some error then we're done.
if ((errno != EINPROGRESS) && (errno != EAGAIN))
goto done;
fd_set write_set;
struct timeval tv;
// Set timeout.
tv.tv_sec = timeout / 1000000;
tv.tv_usec = timeout % 1000000;
unsigned int iterations = 0;
while (1)
{
FD_ZERO(&write_set);
FD_SET(fd, &write_set);
result = select(fd + 1, NULL, &write_set, NULL, &tv);
if (result == -1)
goto done;
else if (result == 0)
{
result = -1;
errno = ETIMEDOUT;
goto done;
}
else
{
if (FD_ISSET(fd, &write_set))
{
socklen_t len;
int socket_error;
len = sizeof(socket_error);
// Get the result of the connect() call.
result = getsockopt(fd, SOL_SOCKET, SO_ERROR,
&socket_error, &len);
if (result == -1)
goto done;
// I think SO_ERROR will be zero for a successful
// result and errno otherwise.
if (socket_error != 0)
{
result = -1;
errno = socket_error;
goto done;
}
// Now that the socket is writable issue another connect.
result = connect(fd, (struct sockaddr*) &addr,
sizeof(addr));
if (result == 0)
{
if (iterations > 1)
{
printf("connect() succeeded on iteration %d\n",
iterations);
}
break;
}
else
{
if ((errno != EAGAIN) && (errno != EINPROGRESS))
{
int err = errno;
printf("second connect() failed, errno = %d\n",
errno);
errno = err;
goto done;
}
iterations++;
}
}
}
}
}
// If we put the socket in non-blocking mode then put it back
// to blocking mode.
if (timeout != 0)
{
// Turn off non-blocking.
int flags;
flags = fcntl(fd, F_GETFL);
fcntl(fd, F_SETFL, flags & ~O_NONBLOCK);
}
// Write buffer.
result = write(fd, buffer, length);
if (result == -1)
{
int err = errno;
printf("write() failed, errno = %d\n", err);
errno = err;
goto done;
}
done:
if (result == -1)
result = errno;
else
result = 0;
if (fd != -1)
{
shutdown(fd, SHUT_RDWR);
close(fd);
}
return result;
}
UPDATE 04/05/2016:
It dawned on me that maybe I need to call connect() multiple times until successful, after all this is non-blocking io not async io. Just like I have to call read() again when there is data to read after encountering an EAGAIN on a read(). In addition, I found the following SO question:
Using select() for non-blocking sockets to connect always returns 1
in which EJP's answer says you need to issue multiple connect()'s. Also, from the book EJP references:
https://books.google.com/books?id=6H9AxyFd0v0C&pg=PT681&lpg=PT681&dq=stevens+and+wright+tcp/ip+illustrated+non-blocking+connect&source=bl&ots=b6kQar6SdM&sig=kt5xZubPZ2atVxs2VQU4mu7NGUI&hl=en&sa=X&ved=0ahUKEwjmp87rlfbLAhUN1mMKHeBxBi8Q6AEIIzAB#v=onepage&q=stevens%20and%20wright%20tcp%2Fip%20illustrated%20non-blocking%20connect&f=false
it seems to indicate you need to issue multiple connect()'s. I've modified the code snippet in this question to call connect() until it succeeds. I probably still need to make changes around possibly updating the timeout value passed to select(), but that's not my immediate question.
Calling connect() multiple times appears to have fixed my original problem, which was that I was getting ENOTCONN when calling write(), I guess because the socket was not connected. However, you can see from the code that I'm tracking how many times through the select loop until connect() succeeds. I've seen the number go into the thousands. This gets me worried that I'm in a busy wait loop. Why is the socket writable even though it's not in a state that connect() will succeed? Is calling connect() clearing that writable state and it's getting set again by the OS for some reason, or am I really in a busy wait loop?
Thanks, Nick