I send 2 bytes of app data on the socket(blocking) every 10 seconds, but the send call got blocked in the last instance in below long for more than 40 seconds.
- 2012-06-13 12:02:46.653417|INFO|before send
- 2012-06-13 12:02:46.653457|INFO|after send (2)
- 2012-06-13 12:02:57.566898|INFO|before send
- 2012-06-13 12:02:57.566962|INFO|after send (2)
- 2012-06-13 12:03:08.234060|INFO|before send
- 2012-06-13 12:03:08.234101|INFO|after send (2)
- **2012-06-13 12:03:19.010743|INFO|before send
- 2012-06-13 12:04:00.969162|INFO|after send (2)**
The tcp default send buffer size on machine(linux) is 65536.
The 2 bytes data is to heartbeat with a server and server expects client to send HB once atleast every 15 seconds.
Also, I did not disable naggle's algorithm.
The question is - can the send call blocked so long like 40 secs? And it is happening only sporadically, it happened after close to 12 hours of running.
The send call I know should just copy the data to TCP send buffer.
publish is called every 10 seconds. No its not gradual slow down of send call. It happens once suddenly and then due to that socket on other side gets closed, so the app exits.
int publish(char* buff, int size) const {
/* Adds the 0x0A to the end */
buff[size]=_eolchar;
if (_debugMode)
{
ACE_DEBUG((MY_INFO "before send\n"));
}
int ret = _socket.send((void*)buff, size+1);
if (_debugMode)
{
ACE_DEBUG((MY_INFO "after send (%d)\n", ret));
//std::cout << "after send " << ret << std::endl;
}
if (ret < 1)
{
ACE_DEBUG((MY_ERROR "Socket error, FH going down\n"));
ACE_OS::sleep(1);
abort();
}
return ret;
}
When using the blocking send()
call, in the viewpoint of your application, you can think of the remote TCP buffer, the network and the local sending TCP buffer, as one big buffer.
That is, if the remote application gets delayed in reading new bytes from its TCP buffer, eventually your local TCP buffer will become (nearly) full. If you try to send()
a new payload that overflows the TCP buffer, the send()
implementation (the kernel system call) won't return the focus to your application until the TCP buffer gets enough room to store that payload.
The only way to reach that state is when the remote application does not read enough bytes. A typical scenario in test environment is when the remote application pauses on a breakpoint ... :-)
This is what we call a SLOW CONSUMER issue. If you share that diagnosis, then there are multiple ways of getting rid of that issue:
- If you have control over the remote application, make it fast enough so that the local application won't get blocked.
- If you don't have the control of the remote application, then there could be multiple answers:
- It can be ok for your own needs to block up to 40 seconds.
- If not so, you need to use an unblocking version of the
send()
system call. From here, there are multiple possible policies; I describe one below. (Hold on please! :-) )
You can try to use a dynamic array which acts as a fake sending TCP FIFO and grows when the sending call returns you EWOULDBLOCK
. In this case you likely have to use the select()
system call to detect when the remote application keeps up with the pace and send it the unseen data first.
It can be a little bit trickier that the simple publish()
function you have here (while quite common in most of network applications). You have to know also there is no guarantee that the dynamic buffer grows to the point you no longer have any free memory, and then your local application could crash. A typical policy in "real-time" network application is to choose an arbitrary maximum size for the buffer which close the TCP connection when reached, thus avoiding your local application to get run out of free memory. Choose that max wisely, since it depends on the number of potential slow consumer connections.
The following(and more that I am not gonna mention now) are considered blocking system calls:
send, connect, recv, accept.
What this means is that they can block as far as they need till the specified job is done.
So yes, send can block for 40 seconds and more, depending on how much time it takes to send the data; though I cannot know why it blocked that long in your specific case.
If you want to avoid this blocking, I advice you to read about asynchronous sockets and I/O.
They MIGHT prove to solve part of your problem.