EDIT Changed title as the problem is no long just how to connect them, but also how to wait for them. Update I solved the problem, and have update my wait handling code below to reflect what is now working. I needed to close all the pipes before waiting for the last sub-command. Previously I was doing that afterwards.
I'm writing a CLI as an assignment in Linux GNU99 C, and implementing pipes at the moment. Initially I thought my problem had to do with the way I had connected the pipes, because I wasn't getting the desired result. Now I've realised that it also has to do with how I wait for the sub-commands that are being chained.
As a template, I'm using the following command: ls|grep "hello"|sort -r
. LS outputs to GREP which outputs to SORT which outputs to stdout. (A common command sequence).
In reference to the diagram below:
In the respective child processes,
For LS
file descriptors (FD) 3,5,6 are not used
For GREP
file descriptors (FD) 4 and 5 are not used
For SORT
file descriptors (FD) 3,4 and 6 are not used
For LS
dup2(4 , STDOUT_FILENO) (binds its stdout to fd 4)
For GREP
dup2(3 , STDIN_FILENO) and dup2(6, STDOUT_FILENO) (binds both stdin/stdout to their respective fds)
For SORT
dup2(5 , STDIN_FILENO) (binds stdin to fd 5)
In each child, once I've done the DUP2()'s, I close all the file descriptors (3-6) before passing control to the actual command through execvp().
In the parent process, I close all the file descriptors (3-6) after I've launched all the children.(Moved this into the launcher, see code below.)
//
// fd# ls---\
// |
// 3 /-----R |
// | | |
// 4 | W --/
// |
// |
// \----grep--\
// |
// 5 /-----R |
// | | |
// 6 | W ---/
// |
// \----sort
//
EDIT
Thanks 'mah' for the early confidence boost, and 'Jon' for the detailed explanation that came a little later.
I actually thought I got it all working at one point. But, as it turned out, only when all the sub-commands were executed as background processes. That was nice, but not quite what I want, since background processes require &
at the end of the command line and the final output is not synchronised with the prompt.
As it currently stands, I seem to have commands with one pipe, eg: ls|sort, working consistently in the foreground, but when I introduce a second pipe, eg: ls|grep|sort, my prompt sometimes gets printed while the compound command is still outputting, which means its running in the background rather than the foreground as its supposed to.
Here is an explanation of my code:
The shell allows the user to type in more than one command, which are delimited by ;
. Single and multiple commands which don't use pipes work fine, both as foreground processes and background. I've also implemented a 'source' command which is able to recurs when the script calls another script.
So the only remaining problem I have is with compound commands that use pipes.
As per standard parsing, I've broken up the user's input into tokens delimited by NULL characters. I keep an array of pointers to each token (which represent commands and parameters), and a parallel array which keeps track of commands. Fairly standard approach I think.
My strategy for dealing with compound commands using pipes has been to treat them as a single command as long as possible. This makes it easier to connect the sub-commands with the pipes (as I don't have to pass around extra information through my program) when I need to. So I designed the parser to give the pipe character a separate token of its own. Thus, in my launchControl function, which calls my launch function (where fork() and execvp() are), I do a final preparation of the sub-commands.
The final preparation involves a few steps:
(1) replacement of the pipe tokens with NULL tokens (thus splitting the sequence into sub-commands compatible with execvp(),
(2) determining which tokens are the sub-commands (as opposed to parameters for the sub-commands),
(3) determining which sub-command reads(writes) to which pipe.
Having done these steps, I enter a loop that passes the necessary info for each sub-command to the launch function itself. After I finish this loop, I close all the pipes created. Here is the signature of my launch function:
int launch (char **tokenList, enum ioTypes procType, int pipeIn, int pipeOut, int *allPipes, enum processType pType)
tokenList
is the sub-command token (followed by its parameters),
procType
(is either none,out,in, or both) and describes its relation to the use of pipes,
pipeIn
is the sub-command's input file descriptor (0 if not used),
pipeOut
is the sub-command's output file descriptor (0 if not used),
allPipes
is a list of all the pipes used in the compound command,
pType
indicates whether the command is to run foreground/background.
(I am using a Signal handler to allow background tasks to report when they are done, same as in bash.)
The launch function (for commands that involve pipes) does the following:
Blocks SIGCHLD to delay SIGCHLD until I'm in the last sub-command.
Create the fork(), then using a switch statement:
IN THE CHILD: (case process == 0)
Depending on procType
, dup2 is called to connect the sub-commands stdin/stdout to the appropriate file descriptors (see diagram above).
Closes ALL the pipes, as per allPipes
(including those used in the dup2 function)
Perform redirection if necessary.
Call execvp() with the sub-command/arguments in tokenList
IN THE PARENT: (default case)
If the current sub-command is the last in the sequence, I unblock SIGCHLD,
And this is where I have my problem.
The code below is WIP, works to some degree but not quite right. It is my current attempt.
//allPipes = NULL for a command that doesn't use pipes.
// procType == in, only occurs for the last sub-command in the sequence.
if ( (allPipes == NULL) || ( (allPipes != NULL) && (procType == in) )) {
if (allPipes != NULL) {
for (int i=2; i<allPipes[0]; i++) { // Parent closes all pipes.
close(allPipes[i]);
}
}
int status; // int where child status will be recorded
pid_t pid;
do {
pid = waitpid(WAIT_ANY, &status,0);
// fprintf(stderr,"Got a PID = %d\n",pid);
} while (pid >0);
if (pid == -1 && !(errno == ECHILD)) {
perror(NULL);
exit(errno);
}
}
This version seems to work fine with ls|sort for as many repeated commands as I have the patience to test.
However, when I make the command ls|sort|grep it becomes unreliable. It usually works fine the first two times, but after that, my prompt starts to appear in the the middle of my output, which means that it's running in the background.
@mah:
Here is my code for tracking commands and pipes, and how I connect them:
struct pipefdRecord {
int pos; // Position of the pipe in the token list
int aPipe[2]; // pipe file descriptor [0] read / [1] write
} pipefdRecord;
struct cmdRecord {
char **command; // Pointer to the sub-command token
int ndxCommand; // Position of command token in the token list
enum ioTypes mode; // none (0), output(1), input(2), or both(3)
int pipeIn; // pipe fd assigned to this process' input
int pipeOut; // pipe fd assigned to this process' output
} cmdRecord;
struct pipefdRecord *pipesAt = malloc(sizeof(struct pipefdRecord));
struct cmdRecord * cmdList = (struct cmdRecord *)malloc(sizeof(struct cmdRecord));
for (int i=0; i<noCommands; i++) { // writing side of pipes
for (int j=0; j<noPipes; j++) {
if ((cmdList[i].ndxCommand < pipesAt[j].pos) && (pipesAt[j].aPipe[1] !=0)) {
cmdList[i].pipeOut = pipesAt[j].aPipe[1]; // assign writing
pipesAt[j].aPipe[1]=0;
cmdList[i].mode = out;
break;
}
}
}
for (int i=noCommands-1; i>=0; i--) { // reading side of pipes
for (int j=noPipes-1; j>=0; j--) {
if (cmdList[i].ndxCommand > pipesAt[j].pos && (pipesAt[j].aPipe[0] !=0)) {
cmdList[i].pipeIn = pipesAt[j].aPipe[0]; // assign reading
pipesAt[j].aPipe[0]=0;
cmdList[i].mode = cmdList[i].mode | in;
break;
}
}
}
With the above code, my pipe allocations are always correct for an arbitrary number of pipes.
Cheers, Nap
I think you're missing some closes, but you are lucky that the missing closes shouldn't prevent your code from working.
It appears from your description that you create two pipes, and that the descriptors returned are 3, 4, 5, 6.
What you should be doing is this (where I'm dropping the
_FILENO
from the file descriptor names):ls
:dup2(4, STDOUT)
; close each of file descriptors 3, 4, 5, 6.sort
:dup2(3, STDIN)
;dup2(6, STDOUT)
; close each of file descriptors 3, 4, 5, 6.grep
:dup2(5, STDIN)
; close each of file descriptors 3, 4, 5, 6.Note the common theme: close all the pipe file descriptors!
What happens if you don't?
ls
: you've left file descriptor 4 (write end of the pipe fromls
tosort
) open; this doesn't matter much asls
does its thing and exits without further ado, closing this end of the pipe.sort
: you've left file descriptors 3 (read end of the pipe fromls
tosort
) and 6 (write end of the pipe fromsort
togrep
) open. File descriptor 3 will report EOF whenls
exits. Whensort
completes, it will close 6. Since the write end of the pipe (both file descriptors 1 and 4 inls
, and 4 insort
) are closed,sort
should get a clean EOF after reading the last of the output fromls
. Note thatsort
reads all its input before generating any output.grep
: you've left file descriptor 5 (read end of pipe fromsort
togrep
open). In due course, whensort
writes its data,grep
will be able to read it. Whensort
completes,grep
will get EOF on its standard input.So, in this example, you've managed to make a pipeline that should work cleanly. However, in general, you should still be closing more file descriptors because it is otherwise easy to end up with an open write end of a pipe that prevents the programs from completing. For example, if
grep
had not closed 4, thensort
would be waiting for input fromgrep
andgrep
would be waiting for input fromsort
, and neither would budge until the other was complete — deadlock.Corrigenda to statements in the question
In your description, you say:
You've not described what happens correctly. The
dup2(4, STDOUT)
function ensures that standard output (file descriptor 1) points to the same open file description as file descriptor 4. (Readopen()
anddup2()
very carefully to distinguish between open file descriptors and open file descriptions!) This means that when the child that becomesls
writes to standard output, it is writing to the write end of the first pipe, which means it goes togrep
. Thels
program continues as it always does, writing to standard output; it is just that standard output is the same as file descriptor 4.Similar comments apply to each of the other statements. The
sort
reads from standard input and writes to standard output; thegrep
reads from standard input and writes to standard output. Thedup2()
calls ensure that these are references to the relevant pipes, that's all.Note that the duplicated descriptors can be closed independently without affecting the other.