recv() data of unknown size with Berkeley Sockets

2020-07-27 05:52发布

问题:

I have a code in C++ in which i use recv() from Berkeley Sockets to receive data from a remote host. The issue is that i do not know the size of the data ( which is variable ) so i need some kind of timeout opt ( probably ) to make this work.

Since I'm new in sockets programming, i was wondering how does for example a web client handle responses from a server ( eg a server sends the html data to the client ). Does it use some kind of timeout, since it doesn't know how big the page is ? Same with an FTP client.

回答1:

When your data is of variable length, then typically that data is framed within another container. That is to say, there's a header preceding the actual data block that tell the receiver how much data it should accept.

For example HTTP uses new line characters to delimit data. If there's variable-length message, then in the header it will include "Content-length:" field that indicates exactly how many bytes to read once entire header is received (header stops when you read 2 consecutive new lines).

It is perfectly fine to read 4 bytes from socket, get how much data follows, then do another receive and read the rest. Only be careful, when you ask for 4 bytes, the socket might give you anywhere between 1-4 bytes so anything less than 4 means you need to go back and ask for remaining few bytes. This is a very common mistake. In dev environment you will almost always get 4 bytes when asking for 4, but once you deploy your app, somewhere on some machine you will get random crashes because their network behavior is somehow different.

Generally, it is a bad approach to rely on timeouts to determine when you reach end of data. With a timeout, you might get things "reliably" working in a well-controlled dev environment, but it is a very flaky solution. Any CPU/disk/network hick up might cause your app to stop receiving prematurely. You are also limiting your data throughput and responsiveness since your app is sleeping for some time interval instead of doing work.