Will wait and waitpid block SIGCHLD and unblock it

2019-08-04 08:04发布

问题:

Here is my code to examine this:

void handler(int n) {
    printf("handler %d\n", n);
    int status;
    if (wait(&status) < 0)
        printf("%s\n", strerror(errno));
}

int main() {
    struct sigaction sig;
    sigemptyset(&sig.sa_mask);
    sig.sa_handler = handler;
    sig.sa_flags = 0;
    sig.sa_restorer = NULL;
    struct sigaction sigold;
    sigaction(SIGCHLD, &sig, &sigold);
    pid_t pid;
    int status;
    printf("before fork\n");
    if ((pid = fork()) == 0) {
        _exit(127);
    } else if (pid > 0) {
        printf("before waitpid\n");
        if (waitpid(pid, &status, 0) < 0)
            printf("%s\n", strerror(errno));
        printf("after waitpid\n");
    }
    printf("after fork\n");
    return 0;
}

The output is:

before fork

before waitpid

handler 17

No child processes

after waitpid

after fork

So, I think waitpid will block SIGCHLD and wait for child to terminate, once the child terminates, it will do something and the unblock the SIGCHLD before it returns, that's why we see "No child processes" error and "after waitpid" is after "handler 17", am I right? if not, what is the truth? How to explain the output sequence? Is there a specification for Linux or something like that to check?

回答1:

The exit information for a process can only be collected once. Your output shows the signal handler being called while your code is in waitpid(), but the handler calls wait() and that collects the information of the child (which you throw away without reporting). Then when you get back to waitpid(), the child exit status has been collected, so there's nothing left for waitpid() to report on, hence the `no child processes' error.

Here's an adaptation of your program. It abuses things by using printf() inside the signal handler function, but it seems to work despite that, testing on a Mac running macOS Sierra 10.12.4 (compiling with GCC 7.1.0).

#include <errno.h>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/wait.h>
#include <unistd.h>

static void handler(int n)
{
    printf("handler %d\n", n);
    int status;
    int corpse;
    if ((corpse = wait(&status)) < 0)
        printf("%s: %s\n", __func__, strerror(errno));
    else
        printf("%s: child %d exited with status 0x%.4X\n", __func__, corpse, status);
}

int main(void)
{
    struct sigaction sig = { 0 };
    sigemptyset(&sig.sa_mask);
    sig.sa_handler = handler;
    sig.sa_flags = 0;
    sigaction(SIGCHLD, &sig, NULL);
    pid_t pid;
    printf("before fork\n");
    if ((pid = fork()) == 0)
    {
        _exit(127);
    }
    else if (pid > 0)
    {
        printf("before waitpid\n");
        int status;
        int corpse;
        while ((corpse = waitpid(pid, &status, 0)) > 0 || errno == EINTR)
        {
            if (corpse < 0)
                printf("loop: %s\n", strerror(errno));
            else
                printf("%s: child %d exited with status 0x%.4X\n", __func__, corpse, status);
        }
        if (corpse < 0)
            printf("%s: %s\n", __func__, strerror(errno));
        printf("after waitpid loop\n");
    }
    printf("after fork\n");
    return 0;
}

Sample output:

before fork
before waitpid
handler 20
handler: child 29481 exited with status 0x7F00
loop: Interrupted system call
main: No child processes
after waitpid loop
after fork

The status value 0x7F00 is the normal encoding for _exit(127). The signal number is different for macOS from Linux; that's perfectly permissible.


To get the code to compile on Linux (Centos 7 and Ubuntu 16.04 LTS used for the test), using GCC 4.8.5 (almost antediluvian — the current version is GCC 7.1.0) and 5.4.0 respectively, using the command line:

$ gcc -O3 -g -std=c11 -Wall -Wextra -Werror -Wmissing-prototypes \
>     -Wstrict-prototypes -Wold-style-definition sg59.c -o sg59
$

I added #define _XOPEN_SOURCE 800 before the first header, and used:

struct sigaction sig;
memset(&sig, '\0', sizeof(sig));

to initialize the structure with GCC 4.8.5. That sort of shenanigan is occasionally a painful necessity to avoid compiler warnings. I note that although the #define was necessary to expose POSIX symbols, the initializer (struct sigaction sig = { 0 };) was accepted by GCC 5.4.0 without problems.

When I then run the program, I get very similar output to what cong reports getting in a comment:

before fork
before waitpid
handler 17
handler: No child processes
main: child 101681 exited with status 0x7F00
main: No child processes
after waitpid loop
after fork

It is curious indeed that on Linux, the process is sent a SIGCHLD signal and yet wait() cannot wait for it in the signal handler. That is at least counter-intuitive.

We can debate how much it matters that the first argument to waitpid() is pid rather than 0; the error is inevitable on the second iteration of the loop since the first collected the information from the child. In practice, it doesn't matter here. In general, it would be better to be using waitpid(0, &status, WNOHANG) or thereabouts — depending on context, 0 instead of WNOHANG might be better.