C fork and pipe program with non-deterministic out

2019-09-21 07:48发布

问题:

Lets consider the following code (please do not write, that there are naming problems, structuring problems, etc, I know this, too). It was written to write out the random generated x,y,z and r (and pid) numbers for its 3 children, but it often happens that it only prints two/one "Got this..." lines, and I dont know, why. Could you please explain me, what the problem is, or correct my code?

#include <stdlib.h> 
#include <stdio.h>
#include <sys/types.h> //fork
#include <sys/stat.h>
#include <unistd.h>
#include <string.h>
#include <fcntl.h> //lock
#include <signal.h>
#include <time.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/shm.h>
#include <sys/wait.h>
#include "sys/ipc.h"
#include "sys/sem.h"

int child;
int cs[3];
int fd[2];
int t;
int parent;
int child;

void okay(int sign)
{
  t = 0;
}

void rr(int sign)
{
  char b[50];
  while(read(fd[0], &b, 50)<0) sleep(0.5);
  printf("Got this: %s \n", b);
}

void ch(int argc, char** argv)
{
  printf("Mypid: %i\n", getpid());
  close(fd[0]);
  while(t==1) sleep(1);
  srand((unsigned)time(NULL)); // init
  int x,y,z,r,pid;
  x = rand() % 101; y = rand() % 101; z = rand() % 101; r = rand() % 101;
  pid = getpid();
  char b[50];
  sprintf(b, "%i %i %i %i %i", pid, x, y, z, r);
  while(write(fd[1], b, 50)<0) sleep(0.2);
  kill(parent, SIGUSR2);
  close(fd[1]);
}

int main(int argc, char** argv)
{
  if(argc < 4)
  {
    printf("Too few args!\n");
    return 0;
  }
  pipe(fd);
  t = 1;
  parent = getpid();
  signal(SIGUSR1, okay);
  child = fork();
  if(child < 0) perror("FORK");
  if(child > 0)
  {
    cs[0] = child;

    child = fork();
    if(child < 0) perror("FORK");
    if(child > 0)
    {

      cs[1] = child;

      child = fork();
      if(child < 0) perror("FORK");
      if(child > 0)
      {
        cs[2] = child; // MAIN
        printf("%i %i %i\n", cs[0], cs[1], cs[2]);
        close(fd[1]);
        signal(SIGUSR2, rr);
        kill(cs[0], SIGUSR1); kill(cs[1], SIGUSR1); kill(cs[2], SIGUSR1);
        int status;

            waitpid(cs[0], &status, 0);
            waitpid(cs[1], &status, 0);
            waitpid(cs[2], &status, 0);
            close(fd[0]);

      }else
      { // ch 3
        ch(argc, argv);
      }   

    }else
    { // ch 2
      ch(argc, argv);
    }  

  }else
  { // ch 1
    ch(argc, argv);
  }
  return 0;
}

回答1:

Rewritten answer

I was able to get the behaviour described even with various amended versions of the code. For example, one trace I got from a diagnostic-laden version of the code was:

14607 at work
Children: 14608 14609 14610
Children signalled
Child 14609: signal 30 - setting t to 0
Child 14608: signal 30 - setting t to 0
Child 14610: signal 30 - setting t to 0
Child 14609: at work
Child 14610: at work
Child 14608: at work
Child 14609: sending 14609 65 24 97 0
Child 14609: exiting
Child 14610: sending 14610 87 17 23 57
Adult 14607: signal 31 - reading input
Child 14610: exiting
Child 14608: sending 14608 5 89 95 8
Child 14608: exiting
Adult 14607: got <<14609 65 24 97 0>>
Adult 14607: signal 31 - reading input
Adult 14607: got <<14610 87 17 23 57>>
Child 1 ended
Child 2 ended
Child 3 ended
14607 exiting

You can see that the parent got the data from 14609 and 14610, but not from 14608. I'm going to attribute this to the use of signals. They're a very poor mechanism for IPC. And, in this case, they seem to be unreliable on the timing. This was code using sigaction() and with the sa.sa_mask value set to block all signals (sigfillset(&sa.sa_mask)).

However, there really isn't any need to use signals from the child back to the parent. I've left the signal handler in place for the parent to notify the children to get weaving, but simplified it to simply change the value of a volatile sig_atomic_t variable (t by name, still) from 1 to 0. The expression is to 'use' the signal number parameter (called sign in the code); it avoids a warning when I compile using GCC 4.7.1 on Mac OS X 10.7.5:

gcc -O3 -g -std=c99 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes \
    pipes-13905948.c -o pipes-13905948

The seeds to srand() mix the time with the PID of the process to give different values from each child (using the PID alone would also do that). I weeded out the 16 headers in the original (including two repeats) to 7. I've removed rr() since the parent is no longer responding to signals from the children. I've restructured the code in main() so it doesn't dive off the RHS of the page. The code includes copious diagnostics about what's going on. It is helpful when dealing with multiple processes like this if the majority of the messages have a PID printed as part of the message. I used 'Adult' instead of 'Parent' so the output is neatly aligned with the lines tagged 'Child'. Note that the signal handler is set before the children are forked. On a multi-CPU machine, there is no guarantee about the sequence in which the processes will execute, so leaving signal setup until after forking is unwise at best and liable to lead to unexpected death at worst.

The reading in the signal handler is replaced by reading in the parent code in main(); this is a much more satisfactory way of dealing with input. You should aim to do as little as possible in a signal handler. The C standard doesn't reliably support much more:

ISO/IEC 9899:2011 §7.14.1 The signal function

¶5 If the signal occurs other than as the result of calling the abort or raise function, the behavior is undefined if the signal handler refers to any object with static or thread storage duration that is not a lock-free atomic object other than by assigning a value to an object declared as volatile sig_atomic_t, or the signal handler calls any function in the standard library other than the abort function, the _Exit function, the quick_exit function, or the signal function with the first argument equal to the signal number corresponding to the signal that caused the invocation of the handler.

POSIX is more lenient, but you still need to be very careful about what you do in a signal handler, and you should do as little as possible in a signal handler.

These changes lead to this code:

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>

static int fd[2];
static volatile sig_atomic_t t = 1;
static int parent;

static void okay(int sign)
{
    t = (sign == 0);
}

static void ch(void)
{
    int pid = getpid();
    printf("Child %i: at work\n", pid);
    close(fd[0]);
    while (t == 1)
    {
        printf("Child %d: pausing on t\n", pid);
        pause();
    }
    srand((unsigned)time(NULL) ^ pid);
    int x = rand() % 101;
    int y = rand() % 101;
    int z = rand() % 101;
    int r = rand() % 101;
    char b[50];
    sprintf(b, "%i %i %i %i %i", pid, x, y, z, r);
    printf("Child %d: sending %s\n", pid, b);
    while (write(fd[1], b, strlen(b)) < 0)
        printf("Child %d: write failed\n", pid);

    close(fd[1]);
    printf("Child %d: exiting\n", pid);
    exit(0);
}

int main(void)
{
    int cs[3];
    pipe(fd);
    parent = getpid();
    printf("%d at work\n", parent);

    struct sigaction sa;
    sa.sa_flags = 0;
    sigfillset(&sa.sa_mask);

    sa.sa_handler = okay;
    sigaction(SIGUSR1, &sa, 0);

    if ((cs[0] = fork()) < 0)
        perror("fork 1");
    else if (cs[0] == 0)
        ch();
    else if ((cs[1] = fork()) < 0)
        perror("fork 2");
    else if (cs[1] == 0)
        ch();
    else if ((cs[2] = fork()) < 0)
        perror("fork 3");
    else if (cs[2] == 0)
        ch();
    else
    {
        printf("Children: %i %i %i\n", cs[0], cs[1], cs[2]);
        close(fd[1]);

        kill(cs[0], SIGUSR1);
        kill(cs[1], SIGUSR1);
        kill(cs[2], SIGUSR1);
        printf("Children signalled\n");

        char buffer[64];
        int nbytes;
        while ((nbytes = read(fd[0], buffer, sizeof(buffer)-1)) > 0)
        {
            buffer[nbytes] = '\0';
            printf("Adult %d: read <<%s>>\n", parent, buffer);
        }

        int status;
        waitpid(cs[0], &status, 0);
        printf("Child 1 ended\n");
        waitpid(cs[1], &status, 0);
        printf("Child 2 ended\n");
        waitpid(cs[2], &status, 0);
        printf("Child 3 ended\n");
        close(fd[0]);
    }
    printf("%d exiting\n", (int)getpid());
    return 0;
}

The code is still flabby on the error handling; there are a lot of unchecked system calls, and unreported results (like child statuses). I'm not convinced about retrying writes on failures, but the code was never exercised.

This is a trace from the revised version of the code.

15745 at work
Children: 15746 15747 15748
Children signalled
Child 15746: at work
Child 15746: sending 15746 63 4 70 89
Child 15748: at work
Child 15746: exiting
Child 15747: at work
Adult 15745: read <<15746 63 4 70 89>>
Child 15748: sending 15748 44 0 99 37
Child 15748: exiting
Child 15747: sending 15747 3 69 68 97
Adult 15745: read <<15748 44 0 99 37>>
Child 15747: exiting
Adult 15745: read <<15747 3 69 68 97>>
Child 1 ended
Child 2 ended
Child 3 ended
15745 exiting

A few times, I got inputs such as:

Adult 15734: read <<15736 83 95 64 2915737 42 63 66 89>>

That combines the output of processes 15736 and 15737 into a single result from read. I'm not happy about that; AFAIK, the reads should be getting the atomic writes of the separate children as separate messages. I'm going to put that down to a quirk of Mac OS X without having researched it further.


Original answer

Since you're using signal() rather than sigaction(), it is possible that your signal handler is reset to SIGDFL before the signal handler is called. You could fix that in okay() by adding:

void okay(int sign)
{
    signal(sign, okay);
    t = 0;
}

You could monitor whether that's a problem by checking the return value from signal() in the handler.

The rest of your code isn't currently using t (though it is set to 1 in main()). (Inaccurate observation!)

You could make your debugging easier by having more print operations. You could use a loop to kill and collect your children (though it is possible to write the loop out as you have done; don't put three function calls on a single line, though).