EDIT Changed title as the problem is no long just how to connect them, but also how to wait for them. Update I solved the problem, and have update my wait handling code below to reflect what is now working. I needed to close all the pipes before waiting for the last sub-command. Previously I was doing that afterwards.
I'm writing a CLI as an assignment in Linux GNU99 C, and implementing pipes at the moment. Initially I thought my problem had to do with the way I had connected the pipes, because I wasn't getting the desired result. Now I've realised that it also has to do with how I wait for the sub-commands that are being chained.
As a template, I'm using the following command: ls|grep "hello"|sort -r
. LS outputs to GREP which outputs to SORT which outputs to stdout. (A common command sequence).
In reference to the diagram below:
In the respective child processes,
For LS
file descriptors (FD) 3,5,6 are not used
For GREP
file descriptors (FD) 4 and 5 are not used
For SORT
file descriptors (FD) 3,4 and 6 are not used
For LS
dup2(4 , STDOUT_FILENO) (binds its stdout to fd 4)
For GREP
dup2(3 , STDIN_FILENO) and dup2(6, STDOUT_FILENO) (binds both stdin/stdout to their respective fds)
For SORT
dup2(5 , STDIN_FILENO) (binds stdin to fd 5)
In each child, once I've done the DUP2()'s, I close all the file descriptors (3-6) before passing control to the actual command through execvp().
In the parent process, I close all the file descriptors (3-6) after I've launched all the children.(Moved this into the launcher, see code below.)
//
// fd# ls---\
// |
// 3 /-----R |
// | | |
// 4 | W --/
// |
// |
// \----grep--\
// |
// 5 /-----R |
// | | |
// 6 | W ---/
// |
// \----sort
//
EDIT
Thanks 'mah' for the early confidence boost, and 'Jon' for the detailed explanation that came a little later.
I actually thought I got it all working at one point. But, as it turned out, only when all the sub-commands were executed as background processes. That was nice, but not quite what I want, since background processes require &
at the end of the command line and the final output is not synchronised with the prompt.
As it currently stands, I seem to have commands with one pipe, eg: ls|sort, working consistently in the foreground, but when I introduce a second pipe, eg: ls|grep|sort, my prompt sometimes gets printed while the compound command is still outputting, which means its running in the background rather than the foreground as its supposed to.
Here is an explanation of my code:
The shell allows the user to type in more than one command, which are delimited by ;
. Single and multiple commands which don't use pipes work fine, both as foreground processes and background. I've also implemented a 'source' command which is able to recurs when the script calls another script.
So the only remaining problem I have is with compound commands that use pipes.
As per standard parsing, I've broken up the user's input into tokens delimited by NULL characters. I keep an array of pointers to each token (which represent commands and parameters), and a parallel array which keeps track of commands. Fairly standard approach I think.
My strategy for dealing with compound commands using pipes has been to treat them as a single command as long as possible. This makes it easier to connect the sub-commands with the pipes (as I don't have to pass around extra information through my program) when I need to. So I designed the parser to give the pipe character a separate token of its own. Thus, in my launchControl function, which calls my launch function (where fork() and execvp() are), I do a final preparation of the sub-commands.
The final preparation involves a few steps:
(1) replacement of the pipe tokens with NULL tokens (thus splitting the sequence into sub-commands compatible with execvp(),
(2) determining which tokens are the sub-commands (as opposed to parameters for the sub-commands),
(3) determining which sub-command reads(writes) to which pipe.
Having done these steps, I enter a loop that passes the necessary info for each sub-command to the launch function itself. After I finish this loop, I close all the pipes created. Here is the signature of my launch function:
int launch (char **tokenList, enum ioTypes procType, int pipeIn, int pipeOut, int *allPipes, enum processType pType)
tokenList
is the sub-command token (followed by its parameters),
procType
(is either none,out,in, or both) and describes its relation to the use of pipes,
pipeIn
is the sub-command's input file descriptor (0 if not used),
pipeOut
is the sub-command's output file descriptor (0 if not used),
allPipes
is a list of all the pipes used in the compound command,
pType
indicates whether the command is to run foreground/background.
(I am using a Signal handler to allow background tasks to report when they are done, same as in bash.)
The launch function (for commands that involve pipes) does the following:
Blocks SIGCHLD to delay SIGCHLD until I'm in the last sub-command.
Create the fork(), then using a switch statement:
IN THE CHILD: (case process == 0)
Depending on procType
, dup2 is called to connect the sub-commands stdin/stdout to the appropriate file descriptors (see diagram above).
Closes ALL the pipes, as per allPipes
(including those used in the dup2 function)
Perform redirection if necessary.
Call execvp() with the sub-command/arguments in tokenList
IN THE PARENT: (default case)
If the current sub-command is the last in the sequence, I unblock SIGCHLD,
And this is where I have my problem.
The code below is WIP, works to some degree but not quite right. It is my current attempt.
//allPipes = NULL for a command that doesn't use pipes.
// procType == in, only occurs for the last sub-command in the sequence.
if ( (allPipes == NULL) || ( (allPipes != NULL) && (procType == in) )) {
if (allPipes != NULL) {
for (int i=2; i<allPipes[0]; i++) { // Parent closes all pipes.
close(allPipes[i]);
}
}
int status; // int where child status will be recorded
pid_t pid;
do {
pid = waitpid(WAIT_ANY, &status,0);
// fprintf(stderr,"Got a PID = %d\n",pid);
} while (pid >0);
if (pid == -1 && !(errno == ECHILD)) {
perror(NULL);
exit(errno);
}
}
This version seems to work fine with ls|sort for as many repeated commands as I have the patience to test.
However, when I make the command ls|sort|grep it becomes unreliable. It usually works fine the first two times, but after that, my prompt starts to appear in the the middle of my output, which means that it's running in the background.
@mah:
Here is my code for tracking commands and pipes, and how I connect them:
struct pipefdRecord {
int pos; // Position of the pipe in the token list
int aPipe[2]; // pipe file descriptor [0] read / [1] write
} pipefdRecord;
struct cmdRecord {
char **command; // Pointer to the sub-command token
int ndxCommand; // Position of command token in the token list
enum ioTypes mode; // none (0), output(1), input(2), or both(3)
int pipeIn; // pipe fd assigned to this process' input
int pipeOut; // pipe fd assigned to this process' output
} cmdRecord;
struct pipefdRecord *pipesAt = malloc(sizeof(struct pipefdRecord));
struct cmdRecord * cmdList = (struct cmdRecord *)malloc(sizeof(struct cmdRecord));
for (int i=0; i<noCommands; i++) { // writing side of pipes
for (int j=0; j<noPipes; j++) {
if ((cmdList[i].ndxCommand < pipesAt[j].pos) && (pipesAt[j].aPipe[1] !=0)) {
cmdList[i].pipeOut = pipesAt[j].aPipe[1]; // assign writing
pipesAt[j].aPipe[1]=0;
cmdList[i].mode = out;
break;
}
}
}
for (int i=noCommands-1; i>=0; i--) { // reading side of pipes
for (int j=noPipes-1; j>=0; j--) {
if (cmdList[i].ndxCommand > pipesAt[j].pos && (pipesAt[j].aPipe[0] !=0)) {
cmdList[i].pipeIn = pipesAt[j].aPipe[0]; // assign reading
pipesAt[j].aPipe[0]=0;
cmdList[i].mode = cmdList[i].mode | in;
break;
}
}
}
With the above code, my pipe allocations are always correct for an arbitrary number of pipes.
Cheers, Nap