Why does TCP/IP on Windows7 take 500 sends to warm

2019-04-21 12:33发布

问题:

We are seeing a bizarre and unexplained phenomenon with ZeroMQ on Windows 7, sending messages over TCP.( Or over inproc, as ZeroMQ uses TCP internally for signalling, on Windows ).

The phenomenon is that the first 500 messages arrive slower and slower, with latency rising steadily. Then latency drops and messages arrive consistently rapidly, except for spikes caused by CPU/network contention.

The issue is described here: https://github.com/zeromq/libzmq/issues/1608

It is consistently 500 messages. If we send without a delay, then messages are batched so we see the phenomenon stretch over several thousand sends. If we delay between sends, we see the graph more clearly. Even delaying as much as 50-100 msec between sends does not change things.

Message size is also irrelevant. I've tested with 10-byte messages and 10K messages, with the same results.

The maximum latency is always 2 msec (2,000 usec).

On Linux boxes we do not see this phenomenon.

What we'd like to do is eliminate this initial curve, so messages leave on a fresh connection with their normal low latency (around 20-100 usec).


Update: the issue does not show on Windows 10 nor 8. It seems to happen only on Windows 7.

回答1:

We've found the cause and a workaround. This is a general issue with all TCP activity on Windows 7 (at least) caused by buffering at the receiver side. You can find some hints on line under "TCP slow start."

On a new connection, or if there connection is idle for (I think) 150 msec or more, the receiver buffers incoming packets and does not provide these to the application, until the receive buffer is full and/or some timeout expires (it's unclear).

Our workaround in ZeroMQ, where we are using TCP sockets for interthread signalling, is to send a dummy chunk of data on new signal pairs. This forces the TCP stack to work "normally" and we then see consistent latencies of around 100-150 usec.

I'm not sure whether this is generally useful; for most applications it's profitable to wait a little on reception, so the TCP stack can deliver more to the calling application.

However for apps that send many small messages, this workaround may be helpful.

Note that if the connection is idle, the slow start happens again, so connections should heartbeat every 100 msec or so, if this is critical.