I'm reading a stream of data through TCP/IP socket. The stream load is very uneven. Sometimes large bulks of data arrive every second, sometimes no data come for an hour. In the case of long inactivity period (no data from remote server, but connection is still online) my program should take some actions.
I'm implementing a timeout using a select(). It tells me if there are data ready, but I don't know exactly how much can I read without causing read() to block. Blocking is unacceptable as it may last far longer than the timeout I need.
For the sake of efficiency, stream is read into large buffer and read() call is provided with that buffer size.
Will read() block after select() if the buffer to be filled is greater than amount of data available right now in the socket?
Actually it should not block (that is what select() is for!), but in fact, it might, exceptionally. Normally, read() should return up to the maximum number of bytes that you've specified, which possibly includes zero bytes (this is actually a valid thing to happen!), but it should never block after previously having reported readiness.
Nevertheless, see the Linux select man page:
Under Linux, select() may report a
socket file descriptor as "ready for
reading", while nevertheless a
subsequent read blocks. This could
for example happen when data has
arrived but upon examination has wrong
checksum and is discarded. There may
be other circumstances in which a file
descriptor is spuriously reported as
ready. Thus it may be safer to use
O_NONBLOCK on sockets that should not
block.
There is O_NONBLOCK
which can be set by fcntl
/F_SETFL
and should result in non-blocking read
.
A blocking file descriptor will block on read() until there is something to read - could be one byte or your entire request. A non-blocking descriptor won't block on read() if there is nothing to read. Select() is not read(). It basically puts the process to sleep and monitors the file descriptor(s), including non-blocking descriptors. When there is activity on one of the descriptors (or the timeout period expires) select returns and you can read your data, or do something else in the case of the timeout.
So you have two separate issues. (1) You want to "take some actions" when there is no data. That's the select timeout. (2) Once you have data (notified by select) you don't want to block on a read. That's the non-blocking mode. When you get EAGAIN on the non-blocking read you loop back to the select and/or "take some actions" and loop back to select.
No, read()
will read up to specified size and will return the actual bytes read, which can be less.
You can use recv() which doesn't block by default( if the flag MSG_WAITALL is not specified)