I have a child process which generates some output of variable length and then sends it to the parent using a half duplex pipe. In the parent, how do I use the read() function? Since the data can be of different length each time, how can I at run time know the size of the data to do any malloc() for a buffer? Can the fstat() function be used on a pipe file descriptor?
I know that the read() function will read a specified number of bytes but will return 0 if the end of file (not the EOF character) is reached before the bytes requested have been read.
I am specifically running Ubuntu GNU/Linux with a 2.6.27-9 Kernel.
All examples in Richard Stevens' Advanced Programming in the UNIX Environment have specified either the length of data while writing into the pipe or have relied on fgets() stdio.h function. Since I am concerned with speed, I want to stay away from using stdio.h as much as possible.
Will this be necessarily faster with shared memory?
Thanks, -Dhruv
You can't get any size information from a pipe, since there is no size.
You need to either use a defined size, or a delimiter.
In other words, in the child, output the size of the upcoming output as an int, then write the actual output; in the parent you you read the size (it's an int, so it's always the same size), then read that many bytes.
Or: define an end character and until you see that, assume you need to keep reading. This may require some sort of escaping/encoding mechanism, however, and probably won't be as fast. I think this is basically what fgets does.
Since it seems that you intend to make a single read of all the data from the pipe, I think the following will serve you better than delimiter+encoding or miniheader techniques suggested in other answers:
From the pipe (7) manpage:
The following example was taken from the pipe (2) manpage and reversed so that the child does the writing, the parent the reading (just to be sure). I also added a variable size buffer. The child will sleep for 5 seconds. The delay will ensure that the exit() of the child can have nothing to do with pipeio (the parent will print a complete line before the child exits).
From your comment I see now that you may want to read the data as it becomes available, to update the UI or whatever, to reflect your system's status. To do that open the pipe in non-blocking (O_NONBLOCK) mode. Read repeatedly whatever is available until -1 is returnd and errno == EAGAIN and do your parsing. Repeat unil read returns 0, which indicates that the child has closed the pipe.
To use an in-memory buffer for File* functions you can use fmemopen() in the GNU C library.
Since the writing end can always write more data to the pipe, there's no way to know the size of the data in it. You can have the sender write the length first, or you can allocate a largish buffer, read as much as you can, then resize the buffer if it's not large enough.
Shared memory will be faster as it avoids copies and may avoid some syscalls, but the locking protocols needed to transfer data across shmem are more complex and prone to error, so it's generally best to avoid shared memory unless you absolutely need it. Additionally, with shared memory you must set a fixed maximum size to the data to be transferred when you allocate the buffer.
You might try using IPC message queues if your messages are not too big.
Why not write the length into the pipe as (say) the first 'n' bytes ? Then at the other end you can read those bytes, determine the length, and then read that number of bytes (i.e. you have a very simple protocol)
Other posters are correct: you must have a way to specify the length of the packets yourself. One concrete, practical way to do this is with netstrings. It's simple to create and parse, and it's supported by some common frameworks such as Twisted.