So I'm making a UNIX minishell, and am trying to add pipelines, so I can do things like this:
ps aux | grep dh | grep -v grep | cut -c1-5
However I'm having trouble wrapping my head around the piping part. I replace all the "|" characters with 0, and then run each line as a normal line. However, I am trying to divert the output and input. The input of a command needs to be the output of the previous command, and the output of a command needs to be the input of the next command.
I'm doing this using pipes, however I can't figure out where to call pipe() and where to close them. From the main processing function, processline(), I have this code:
if((pix = findUnquotChar(line_itr, '|')))
{
line_itr[pix++] = 0;
if(pipe (fd) < 0) perror("pipe");
processline(line_itr, inFD, fd[1], pl_flags);
line_itr = &(line_itr[pix]);
while((pix = findUnquotChar(line_itr, '|')) && pix < line_len)
{
line_itr[pix++] = 0;
//? if(pipe (fd) < 0) perror("pipe");
processline(line_itr, fd[0], fd[1] pl_flags);
line_itr = &(line_itr[pix]);
//? close(fd[0]);
//? close(fd[1]);
}
return;
}
So, I'm recursively(the code above is in processline) sending the commands in between the "|" to be processed by processline. You can see where I commented out the code above, I'm not sure how to make it work. The 2nd and 3rd parameter of processline are the inputFD and outputFD respectively, so I need to process a command, write the output to a pipe, and then call processline again on the next command, however this time the output of the previous command is the input. This just doesn't seem like it can work though, because each time I close fd[0] I'm losing the previous output. Do I need two separate pipes, that I can flip flop back and forth with?
I'm just having trouble seeing how this is possible with a single pipe, if you guys need any additional info just ask. Here's the entire processline function in case you want to take a look:
EDIT: If anybody has an example of a shell that implements pipelines I would love a link to the source, I haven't been able to find one on google so far.
EDIT2: Here's an example of my predicament:
echo a | echo b | echo c
So first I would call the shell like this:
processline("echo a", 0, fd[1], flags);
....
processline("echo b", fd[0], NOT_SURE_GOES_HERE[1], flags);
....
processline("echo c", NOT_SURE_GOES_HERE[0], NOT_SURE_EITHER[1], flags);
Each of these occurs once per iteration, and as you can see I can't figure out what to pass for the input-file-descriptors and the output-file-descriptors for the 2nd and 3rd(and so on) iteration.
Here's some moderately generic but simple code to execute pipelines, a program I'm calling
pipeline
. It's an SSCCE in a single file as presented, though I'd have the filesstderr.h
andstderr.c
as separate files in a library to be linked with all my programs. (Actually, I have a more complex set of functions in my 'real'stderr.c
andstderr.h
, but this is a good starting point.)The code operates in two ways. If you supply no arguments, then it runs a built-in pipeline:
This counts the number of times each person is logged in on the system, presenting the list in order of increasing number of sessions. Alternatively, you can invoke with a sequence of arguments that are the command line you want invoked, use a quoted pipe
'|'
(or"|"
) to separate commands:Valid:
Invalid:
The last three invocations enforce 'pipes as separators'. The code does not error check every system call; it does error check
fork()
,execvp()
andpipe()
, but skips checking ondup2()
andclose()
. It doesn't include diagnostic printing for the commands that are generated; a-x
option topipeline
would be a sensible addition, causing it to print out a trace of what it does. It also does not exit with the exit status of the last command in the pipeline.Note that the code starts with a child being forked. The child will become the last process in the pipeline, but first creates a pipe and forks another process to run the earlier processes in the pipeline. The mutually recursive functions are unlikely to be the only way of sorting things out, but they do leave minimal code repetition (earlier drafts of the code had the content of
exec_nth_command()
largely repeated inexec_pipeline()
andexec_pipe_command()
).The process structure here is such that the original process only knows about the last process in the pipeline. It is possible to redesign things in such a way that the original process is the parent of every process in the pipeline, so the original process can report separately on the status of each command in the pipeline. I've not yet modified the code to allow for that structure; it will be a little more complex, though not hideously so.
Signals and SIGCHLD
The POSIX Signal Concepts section discusses SIGCHLD:
Under SIG_DFL:
Under SIG_IGN:
The description of
<signal.h>
has a table of default dispositions for signals, and for SIGCHLD, the default is I (SIG_IGN).I added another function to the code above:
I called it immediately after the call to
err_setarg0()
, and it reports 'Default' on both Mac OS X 10.7.5 and Linux (RHEL 5, x86/64). I validated its operation by running:On both platforms, that reported 'Ignored', and the
pipeline
command no longer reported the exit status of the child; it didn't get it.So, if the program is ignoring SIGCHLD, it does not generate any zombies, but does wait until 'all' of its children terminate. That is, until all of its direct children terminate; a process cannot wait on its grandchildren or more distant progeny, nor on its siblings, nor on its ancestors.
On the other hand, if the setting for SIGCHLD is the default, the signal is ignored, and zombies are created.
That's the most convenient behaviour for this program as written. The
corpse_collector()
function has a loop that collects the status information from any children. There's only one child at a time with this code; the rest of the pipeline is run as a child (of the child, of the child, ...) of the last process in the pipeline.I'm not sure I understand the problem. With the code I provided,
cmd3
is the parent ofcmd2
, andcmd2
is the parent ofcmd1
in a 3-command pipeline (and the shell is the parent ofcmd3
), so the shell can only wait oncmd3
. I did state originally:If you've got your shell able to wait on all three commands in the pipeline, you must be using the alternative organization.
The
waitpid()
description includes:This means that if you're using process groups and the shell knows which process group the pipeline is running in (for example, because the pipeline is put into its own process group by the first process), then the parent can wait for the appropriate children to terminate.
...rambling... I think there's some useful information here; there probably should be more that I'm writing, but my mind's gone blank.