C# Begin/EndReceive - how do I read large data?

2019-01-07 15:08发布

When reading data in chunks of say, 1024, how do I continue to read from a socket that receives a message bigger than 1024 bytes until there is no data left? Should I just use BeginReceive to read a packet's length prefix only, and then once that is retrieved, use Receive() (in the async thread) to read the rest of the packet? Or is there another way?

edit:

I thought Jon Skeet's link had the solution, but there is a bit of a speedbump with that code. The code I used is:

public class StateObject
{
    public Socket workSocket = null;
    public const int BUFFER_SIZE = 1024;
    public byte[] buffer = new byte[BUFFER_SIZE];
    public StringBuilder sb = new StringBuilder();
}

public static void Read_Callback(IAsyncResult ar)
{
    StateObject so = (StateObject) ar.AsyncState;
    Socket s = so.workSocket;

    int read = s.EndReceive(ar);

    if (read > 0) 
    {
        so.sb.Append(Encoding.ASCII.GetString(so.buffer, 0, read));

        if (read == StateObject.BUFFER_SIZE)
        {
            s.BeginReceive(so.buffer, 0, StateObject.BUFFER_SIZE, 0, 
                    new AyncCallback(Async_Send_Receive.Read_Callback), so);
            return;
        }
    }

    if (so.sb.Length > 0)
    {
        //All of the data has been read, so displays it to the console
        string strContent;
        strContent = so.sb.ToString();
        Console.WriteLine(String.Format("Read {0} byte from socket" + 
        "data = {1} ", strContent.Length, strContent));
    }
    s.Close();
}

Now this corrected works fine most of the time, but it fails when the packet's size is a multiple of the buffer. The reason for this is if the buffer gets filled on a read it is assumed there is more data; but the same problem happens as before. A 2 byte buffer, for exmaple, gets filled twice on a 4 byte packet, and assumes there is more data. It then blocks because there is nothing left to read. The problem is that the receive function doesn't know when the end of the packet is.


This got me thinking to two possible solutions: I could either have an end-of-packet delimiter or I could read the packet header to find the length and then receive exactly that amount (as I originally suggested).

There's problems with each of these, though. I don't like the idea of using a delimiter, as a user could somehow work that into a packet in an input string from the app and screw it up. It also just seems kinda sloppy to me.

The length header sounds ok, but I'm planning on using protocol buffers - I don't know the format of the data. Is there a length header? How many bytes is it? Would this be something I implement myself? Etc..

What should I do?

6条回答
可以哭但决不认输i
2楼-- · 2019-01-07 15:50

For info (general Begin/End usage), you might want to see this blog post; this approach is working OK for me, and saving much pain...

查看更多
ら.Afraid
3楼-- · 2019-01-07 15:51

There seems to be a lot of confusion surrounding this. The examples on MSDN's site for async socket communication using TCP are misleading and not well explained. The EndReceive call will indeed block if the message size is an exact multiple of the receive buffer. This will cause you to never get your message and the application to hang.

Just to clear things up - You MUST provide your own delimiter for data if you are using TCP. Read the following (this is from a VERY reliable source).

The Need For Application Data Delimiting

The other impact of TCP treating incoming data as a stream is that data received by an application using TCP is unstructured. For transmission, a stream of data goes into TCP on one device, and on reception, a stream of data goes back to the application on the receiving device. Even though the stream is broken into segments for transmission by TCP, these segments are TCP-level details that are hidden from the application. So, when a device wants to send multiple pieces of data, TCP provides no mechanism for indicating where the “dividing line” is between the pieces, since TCP doesn't examine the meaning of the data at all. The application must provide a means for doing this.

Consider for example an application that is sending database records. It needs to transmit record #579 from the Employees database table, followed by record #581 and record #611. It sends these records to TCP, which treats them all collectively as a stream of bytes. TCP will package these bytes into segments, but in a manner the application cannot predict. It is possible that each will end up in a different segment, but more likely they will all be in one segment, or part of each will end up in different segments, depending on their length. The records themselves must have some sort of explicit markers so the receiving device can tell where one record ends and the next starts.

Source: http://www.tcpipguide.com/free/t_TCPDataHandlingandProcessingStreamsSegmentsandSequ-3.htm

Most examples I see online for using EndReceive are wrong or misleading. It usually causes no problems in the examples because only one predefined message is sent and then the connection is closed.

查看更多
forever°为你锁心
4楼-- · 2019-01-07 15:53

You would read the length prefix first. Once you have that, you would just keep reading the bytes in blocks (and you can do this async, as you surmised) until you have exhausted the number of bytes you know are coming in off the wire.

Note that at some point, when reading the last block you won't want to read the full 1024 bytes, depending on what the length-prefix says the total is, and how many bytes you have read.

查看更多
闹够了就滚
5楼-- · 2019-01-07 15:57

Also I troubled same problem.

When I tested several times, I found that sometimes multiple BeginReceive - EndReceive makes packet loss. (This loop was ended improperly)

In my case, I used two solution.

First, I defined the enough packet size to make only 1 time BeginReceive() ~ EndReceive();

Second, When I receive large size of data, I used NetworkStream.Read() instead of BeginReceive() - EndReceive().

Asynchronous socket is not easy to use, and it need a lot of understanding about socket.

查看更多
姐就是有狂的资本
6楼-- · 2019-01-07 16:02

No - call BeginReceive again from the callback handler, until EndReceive returns 0. Basically, you should keep on receiving asynchronously, assuming you want the fullest benefit of asynchronous IO.

If you look at the MSDN page for Socket.BeginReceive you'll see an example of this. (Admittedly it's not as easy to follow as it might be.)

查看更多
我想做一个坏孩纸
7楼-- · 2019-01-07 16:04

Dang. I'm hesitant to even reply to this given the dignitaries that have already weighed in, but here goes. Be gentle, O Great Ones!

Without having the benefit of reading Marc's blog (it's blocked here due the corporate internet policy), I'm going to offer "another way."

The trick, in my mind, is to separate the receipt of the data from the processing of that data.

I use a StateObject class defined like this. It differs from the MSDN StateObject implementation in that it does not include the StringBuilder object, the BUFFER_SIZE constant is private, and it includes a constructor for convenience.

public class StateObject
{
    private const int BUFFER_SIZE = 65535;
    public byte[] Buffer = new byte[BUFFER_SIZE];
    public readonly Socket WorkSocket = null;

    public StateObject(Socket workSocket)
    {
        WorkSocket = workSocket;
    }
}

I also have a Packet class that is simply a wrapper around a buffer and a timestamp.

public class Packet
{
    public readonly byte[] Buffer;
    public readonly DateTime Timestamp;

    public Packet(DateTime timestamp, byte[] buffer, int size)
    {
        Timestamp = timestamp;
        Buffer = new byte[size];
        System.Buffer.BlockCopy(buffer, 0, Buffer, 0, size);
    }
}

My ReceiveCallback() function looks like this.

public static ManualResetEvent PacketReceived = new ManualResetEvent(false);
public static List<Packet> PacketList = new List<Packet>();
public static object SyncRoot = new object();
public static void ReceiveCallback(IAsyncResult ar)
{
    try {
        StateObject so = (StateObject)ar.AsyncState;
        int read = so.WorkSocket.EndReceive(ar);

        if (read > 0) {
            Packet packet = new Packet(DateTime.Now, so.Buffer, read);
            lock (SyncRoot) {
                PacketList.Add(packet);
            }
            PacketReceived.Set();
        }

        so.WorkSocket.BeginReceive(so.Buffer, 0, so.Buffer.Length, 0, ReceiveCallback, so);
    } catch (ObjectDisposedException) {
        // Handle the socket being closed with an async receive pending
    } catch (Exception e) {
        // Handle all other exceptions
    }
}

Notice that this implementation does absolutely no processing of the received data, nor does it have any expections as to how many bytes are supposed to have been received. It simply receives whatever data happens to be on the socket (up to 65535 bytes) and stores that data in the packet list, and then it immediately queues up another asynchronous receive.

Since processing no longer occurs in the thread that handles each asynchronous receive, the data will obviously be processed by a different thread, which is why the Add() operation is synchronized via the lock statement. In addition, the processing thread (whether it is the main thread or some other dedicated thread) needs to know when there is data to process. To do this, I usually use a ManualResetEvent, which is what I've shown above.

Here is how the processing works.

static void Main(string[] args)
{
    Thread t = new Thread(
        delegate() {
            List<Packet> packets;
            while (true) {
                PacketReceived.WaitOne();
                PacketReceived.Reset();
                lock (SyncRoot) {
                    packets = PacketList;
                    PacketList = new List<Packet>();
                }

                foreach (Packet packet in packets) {
                    // Process the packet
                }
            }
        }
    );
    t.IsBackground = true;
    t.Name = "Data Processing Thread";
    t.Start();
}

That's the basic infrastructure I use for all of my socket communication. It provides a nice separation between the receipt of the data and the processing of that data.

As to the other question you had, it is important to remember with this approach that each Packet instance does not necessarily represent a complete message within the context of your application. A Packet instance might contain a partial message, a single message, or multiple messages, and your messages might span multiple Packet instances. I've addressed how to know when you've received a full message in the related question you posted here.

查看更多
登录 后发表回答