Lets consider the following code (please do not write, that there are naming problems, structuring problems, etc, I know this, too). It was written to write out the random generated x,y,z and r (and pid) numbers for its 3 children, but it often happens that it only prints two/one "Got this..." lines, and I dont know, why. Could you please explain me, what the problem is, or correct my code?
#include <stdlib.h>
#include <stdio.h>
#include <sys/types.h> //fork
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h> //lock
#include <signal.h>
#include <time.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/shm.h>
#include <sys/wait.h>
#include "sys/ipc.h"
#include "sys/sem.h"
int child;
int cs[3];
int fd[2];
int t;
int parent;
int child;
void okay(int sign)
{
t = 0;
}
void rr(int sign)
{
char b[50];
while(read(fd[0], &b, 50)<0) sleep(0.5);
printf("Got this: %s \n", b);
}
void ch(int argc, char** argv)
{
printf("Mypid: %i\n", getpid());
close(fd[0]);
while(t==1) sleep(1);
srand((unsigned)time(NULL)); // init
int x,y,z,r,pid;
x = rand() % 101; y = rand() % 101; z = rand() % 101; r = rand() % 101;
pid = getpid();
char b[50];
sprintf(b, "%i %i %i %i %i", pid, x, y, z, r);
while(write(fd[1], b, 50)<0) sleep(0.2);
kill(parent, SIGUSR2);
close(fd[1]);
}
int main(int argc, char** argv)
{
if(argc < 4)
{
printf("Too few args!\n");
return 0;
}
pipe(fd);
t = 1;
parent = getpid();
signal(SIGUSR1, okay);
child = fork();
if(child < 0) perror("FORK");
if(child > 0)
{
cs[0] = child;
child = fork();
if(child < 0) perror("FORK");
if(child > 0)
{
cs[1] = child;
child = fork();
if(child < 0) perror("FORK");
if(child > 0)
{
cs[2] = child; // MAIN
printf("%i %i %i\n", cs[0], cs[1], cs[2]);
close(fd[1]);
signal(SIGUSR2, rr);
kill(cs[0], SIGUSR1); kill(cs[1], SIGUSR1); kill(cs[2], SIGUSR1);
int status;
waitpid(cs[0], &status, 0);
waitpid(cs[1], &status, 0);
waitpid(cs[2], &status, 0);
close(fd[0]);
}else
{ // ch 3
ch(argc, argv);
}
}else
{ // ch 2
ch(argc, argv);
}
}else
{ // ch 1
ch(argc, argv);
}
return 0;
}
Rewritten answer
I was able to get the behaviour described even with various amended versions of the code. For example, one trace I got from a diagnostic-laden version of the code was:
You can see that the parent got the data from 14609 and 14610, but not from 14608. I'm going to attribute this to the use of signals. They're a very poor mechanism for IPC. And, in this case, they seem to be unreliable on the timing. This was code using
sigaction()
and with thesa.sa_mask
value set to block all signals (sigfillset(&sa.sa_mask)
).However, there really isn't any need to use signals from the child back to the parent. I've left the signal handler in place for the parent to notify the children to get weaving, but simplified it to simply change the value of a
volatile sig_atomic_t
variable (t
by name, still) from 1 to 0. The expression is to 'use' the signal number parameter (calledsign
in the code); it avoids a warning when I compile using GCC 4.7.1 on Mac OS X 10.7.5:The seeds to
srand()
mix the time with the PID of the process to give different values from each child (using the PID alone would also do that). I weeded out the 16 headers in the original (including two repeats) to 7. I've removedrr()
since the parent is no longer responding to signals from the children. I've restructured the code inmain()
so it doesn't dive off the RHS of the page. The code includes copious diagnostics about what's going on. It is helpful when dealing with multiple processes like this if the majority of the messages have a PID printed as part of the message. I used 'Adult' instead of 'Parent' so the output is neatly aligned with the lines tagged 'Child'. Note that the signal handler is set before the children are forked. On a multi-CPU machine, there is no guarantee about the sequence in which the processes will execute, so leaving signal setup until after forking is unwise at best and liable to lead to unexpected death at worst.The reading in the signal handler is replaced by reading in the parent code in
main()
; this is a much more satisfactory way of dealing with input. You should aim to do as little as possible in a signal handler. The C standard doesn't reliably support much more:POSIX is more lenient, but you still need to be very careful about what you do in a signal handler, and you should do as little as possible in a signal handler.
These changes lead to this code:
The code is still flabby on the error handling; there are a lot of unchecked system calls, and unreported results (like child statuses). I'm not convinced about retrying writes on failures, but the code was never exercised.
This is a trace from the revised version of the code.
A few times, I got inputs such as:
That combines the output of processes 15736 and 15737 into a single result from read. I'm not happy about that; AFAIK, the reads should be getting the atomic writes of the separate children as separate messages. I'm going to put that down to a quirk of Mac OS X without having researched it further.
Original answer
Since you're using
signal()
rather thansigaction()
, it is possible that your signal handler is reset to SIGDFL before the signal handler is called. You could fix that inokay()
by adding:You could monitor whether that's a problem by checking the return value from
signal()
in the handler.The rest of your code isn't currently using(Inaccurate observation!)t
(though it is set to1
inmain()
).You could make your debugging easier by having more print operations. You could use a loop to kill and collect your children (though it is possible to write the loop out as you have done; don't put three function calls on a single line, though).