Improving UDP reliability

2019-05-13 21:48发布

问题:

I am building a small UDP based server. the server is based on .Net and uses the Socket class it self. I'm using completion ports through ReceiveMessageFromAsync, and the async send.

My problem is I'm loosing around 5%-10% of my traffic. Now i understand this is normal, but is there any way of improving this statistic?

回答1:

You might want to look at the answers to this question before rolling your own reliability layer on top of UDP... What do you use when you need reliable UDP?

Alternatively you can try and increase the amount of data that gets through by making the socket's send and recv buffers as large as possible by setting the appropriate socket options before you start to recv.



回答2:

Ensure that you do not send UDP datagrams larger than the path MTU (which is typically no more than ~1400 bytes, and sometimes less). Such packets will be fragmented into multiple IP packets and reassembled at the destination - and if any one of those fragments is lost, then the entire UDP datagram will be discarded.

This has an amplification effect on the packet loss rate - this table shows how the UDP datagram loss rate goes up dramatically as the number of fragments used to carry it increases:

Underlying Fragment Loss Rate: 1.00%

Fragments   UDP Datagram Loss Rate
--------------------------------------
1           1.00%
2           1.99%
3           2.97%
4           3.94%
5           4.90%
6           5.85%
7           6.79%
8           7.73%
9           8.65%
10          9.56%
15          13.99%
20          18.21%
30          26.03%
40          33.10%


回答3:

Windows architecture for sockets seems to be adverse to good UDP performance from multiple copying of packet buffers from the kernel through protocol handlers to the application. MSDN seems to prefer pointing developers to the Winsock Kernel (WSK), replacing the former Transport Driver Interface (TDI), if they want reasonable datagram performance such as implementing a reliable UDP protocol.

However it might just be non-stellar Windows drivers for your NIC hardware, for I see great performance on Linux with Broadcom hardware but less than 25% of the performance in Windows. Some of this I can see is due to lack of transmit interrupt coalescing, Windows performance monitoring always reports 0 coalesces for transmits but a variable range for receives. On Linux I can tune the coalescing and see distinct performance changes. The driver software from Broadcom only appears to support transmit coalescing on later hardware releases.

Coalescing means that packets are being sent in batches into and out of the NIC, batching packets will usually mean lower CPU usage and less chance of dropping due to full buffers or other system activity.

So whilst it looks like it would be impractical to change OS you can try different hardware to minimize the impact of limited drivers.



回答4:

About all you can do is something on the same general order as TCP uses -- keep track of which packets were received, and send back something to ACK/NAK packets to get those that didn't arrive re-sent.



回答5:

Another aspect of reducing packet loss is to mediate the rate at which you send packets. If you are sending data faster than the bottleneck point on the path can handle, this will become apparent as dropped packets. Even if your average data rate is quite low, you may still be sending bursts of packets in quick succession.

TCP handles this by limiting the number of outstanding, unacknowledged data bytes to a value called the congestion window. The congestion window starts off small, and is slowly increased until packet loss occurs, at which point it is scaled back (progressively more so if packet loss continues to occur). You could implement something similar, if in your protocol the sender is notified of packet loss.