prctl(PR_SET_PDEATHSIG) race condition

2019-05-28 11:00发布

问题:

As I understand, the best way to achieve terminating a child process when its parent dies is via prctl(PR_SET_PDEATHSIG) (at least on Linux): How to make child process die after parent exits?

There is one caveat to this mentioned in man prctl:

This value is cleared for the child of a fork(2) and (since Linux 2.4.36 / 2.6.23) when executing a set-user-ID or set-group-ID binary, or a binary that has associated capabilities (see capabilities(7)). This value is preserved across execve(2).

So, the following code has a race condition:

parent.c:

#include <unistd.h>

int main(int argc, char **argv) {
  int f = fork();
  if (fork() == 0) {
    execl("./child", "child", NULL, NULL);
  }
  return 0;
}

child.c:

#include <sys/prctl.h>
#include <signal.h>

int main(int argc, char **argv) {
  prctl(PR_SET_PDEATHSIG, SIGKILL); // ignore error checking for now
  // ...
  return 0;
}

Namely, the parent count die before prctl() is executed in the child (and thus the child will not receive the SIGKILL). The proper way to address this is to prctl() in the parent before the exec():

parent.c:

#include <unistd.h>
#include <sys/prctl.h>
#include <signal.h>

int main(int argc, char **argv) {
  int f = fork();
  if (fork() == 0) {
    prctl(PR_SET_PDEATHSIG, SIGKILL); // ignore error checking for now
    execl("./child", "child", NULL, NULL);
  }
  return 0;
}

child.c:

int main(int argc, char **argv) {
  // ...
  return 0;
}

However, if ./child is a setuid/setgid binary, then this trick to avoid the race condition doesn't work (exec()ing the setuid/setgid binary causes the PDEATHSIG to be lost as per the man page quoted above), and it seems like you are forced to employ the first (racy) solution.

Is there any way if child is a setuid/setgid binary to prctl(PR_SET_PDEATH_SIG) in a non-racy way?

回答1:

It is much more common to have the parent process set up a pipe. Parent process keeps the write end open (pipefd[1]), closing the read end (pipefd[0]). Child process closes the write end (pipefd[1]), and sets the read end (pipefd[1]) nonblocking.

This way, the child process can use read(pipefd[0], buffer, 1) to check if the parent process is still alive. If the parent is still running, it will return -1 with errno == EAGAIN (or errno == EINTR).

Now, in Linux, the child process can also set the read end async, in which case it will be sent a signal (SIGIO by default) when the parent process exits:

fcntl(pipefd[0], F_SETSIG, desired_signal);
fcntl(pipefd[0], F_SETOWN, getpid());
fcntl(pipefd[0], F_SETFL, O_NONBLOCK | O_ASYNC);

Use a siginfo handler for desired_signal. If info->si_code == POLL_IN && info->si_fd == pipefd[0], the parent process either exited or wrote something to the pipe. Because read() is async-signal safe, and the pipe is nonblocking, you can use read(pipefd[0], &buffer, sizeof buffer) in the signal handler whether the parent wrote something, or if parent exited (closed the pipe). In the latter case, the read() will return 0.

As far as I can see, this approach has no race conditions (if you use a realtime signal, so that the signal is not lost because an user-sent one is already pending), although it is very Linux-specific. After setting the signal handler, and at any point during the lifetime of the child process, the child can always explicitly check if the parent is still alive, without affecting the signal generation.

So, to recap, in pseudocode:

Construct pipe
Fork child process

Child process:
    Close write end of pipe
    Install pipe signal handler (say, SIGRTMIN+0)
    Set read end of pipe to generate pipe signal (F_SETSIG)
    Set own PID as read end owner (F_SETOWN)
    Set read end of pipe nonblocking and async (F_SETFL, O_NONBLOCK | O_ASYNC)
    If read(pipefd[0], buffer, sizeof buffer) == 0,
        the parent process has already exited.

    Continue with normal work.

Child process pipe signal handler:
    If siginfo->si_code == POLL_IN and siginfo->si_fd == pipefd[0],
        parent process has exited.
        To immediately die, use e.g. raise(SIGKILL).    

Parent process:
    Close read end of pipe

    Continue with normal work.

I do not expect you to believe my word.

Below is a crude example program you can use to check this behaviour yourself. It is long, but only because I wanted it to be easy to see what is happening at runtime. To implement this in a normal program, you only need a couple of dozen lines of code. example.c:

#define _GNU_SOURCE
#define _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>
#include <fcntl.h>
#include <signal.h>
#include <string.h>
#include <stdio.h>
#include <errno.h>

static volatile sig_atomic_t done = 0;

static void handle_done(int signum)
{
    if (!done)
        done = signum;
}

static int install_done(const int signum)
{
    struct sigaction act;

    memset(&act, 0, sizeof act);
    sigemptyset(&act.sa_mask);
    act.sa_handler = handle_done;
    act.sa_flags = 0;
    if (sigaction(signum, &act, NULL) == -1)
        return errno;

    return 0;
}

static int  deathfd = -1;

static void death(int signum, siginfo_t *info, void *context)
{
    if (info->si_code == POLL_IN && info->si_fd == deathfd)
        raise(SIGTERM);
}

static int install_death(const int signum)
{
    struct sigaction act;

    memset(&act, 0, sizeof act);
    sigemptyset(&act.sa_mask);
    act.sa_sigaction = death;
    act.sa_flags = SA_SIGINFO;
    if (sigaction(signum, &act, NULL) == -1)
        return errno;

    return 0;
}

int main(void)
{
    pid_t  child, p;
    int    pipefd[2], status;
    char   buffer[8];

    if (install_done(SIGINT)) {
        fprintf(stderr, "Cannot set SIGINT handler: %s.\n", strerror(errno));
        return EXIT_FAILURE;
    }

    if (pipe(pipefd) == -1) {
        fprintf(stderr, "Cannot create control pipe: %s.\n", strerror(errno));
        return EXIT_FAILURE;
    }

    child = fork();
    if (child == (pid_t)-1) {
        fprintf(stderr, "Cannot fork child process: %s.\n", strerror(errno));
        return EXIT_FAILURE;
    }

    if (!child) {
        /*
         * Child process.
        */

        /* Close write end of pipe. */
        deathfd = pipefd[0];
        close(pipefd[1]);

        /* Set a SIGHUP signal handler. */
        if (install_death(SIGHUP)) {
            fprintf(stderr, "Child process: cannot set SIGHUP handler: %s.\n", strerror(errno));
            return EXIT_FAILURE;
        }

        /* Set SIGTERM signal handler. */
        if (install_done(SIGTERM)) {
            fprintf(stderr, "Child process: cannot set SIGTERM handler: %s.\n", strerror(errno));
            return EXIT_FAILURE;
        }

        /* We want a SIGHUP instead of SIGIO. */
        fcntl(deathfd, F_SETSIG, SIGHUP);

        /* We want the SIGHUP delivered when deathfd closes. */
        fcntl(deathfd, F_SETOWN, getpid());

        /* Make the deathfd (read end of pipe) nonblocking and async. */
        fcntl(deathfd, F_SETFL, O_NONBLOCK | O_ASYNC);

        /* Check if the parent process is dead. */
        if (read(deathfd, buffer, sizeof buffer) == 0) {
            printf("Child process (%ld): Parent process is already dead.\n", (long)getpid());
            return EXIT_FAILURE;
        }

        while (1) {
            status = __atomic_fetch_and(&done, 0, __ATOMIC_SEQ_CST);
            if (status == SIGINT)
                printf("Child process (%ld): SIGINT caught and ignored.\n", (long)getpid());
            else
            if (status)
                break;
            printf("Child process (%ld): Tick.\n", (long)getpid());
            fflush(stdout);
            sleep(1);

            status = __atomic_fetch_and(&done, 0, __ATOMIC_SEQ_CST);
            if (status == SIGINT)
                printf("Child process (%ld): SIGINT caught and ignored.\n", (long)getpid());
            else
            if (status)
                break;
            printf("Child process (%ld): Tock.\n", (long)getpid());
            fflush(stdout);
            sleep(1);
        }

        printf("Child process (%ld): Exited due to %s.\n", (long)getpid(),
               (status == SIGINT) ? "SIGINT" :
               (status == SIGHUP) ? "SIGHUP" :
               (status == SIGTERM) ? "SIGTERM" : "Unknown signal.\n");
        fflush(stdout);

        return EXIT_SUCCESS;
    }

    /*
     * Parent process.
    */

    /* Close read end of pipe. */
    close(pipefd[0]);

    while (!done) {
        fprintf(stderr, "Parent process (%ld): Tick.\n", (long)getpid());
        fflush(stderr);
        sleep(1);
        fprintf(stderr, "Parent process (%ld): Tock.\n", (long)getpid());
        fflush(stderr);
        sleep(1);

        /* Try reaping the child process. */
        p = waitpid(child, &status, WNOHANG);
        if (p == child || (p == (pid_t)-1 && errno == ECHILD)) {
            if (p == child && WIFSIGNALED(status))
                fprintf(stderr, "Child process died from %s. Parent will now exit, too.\n",
                        (WTERMSIG(status) == SIGINT) ? "SIGINT" :
                        (WTERMSIG(status) == SIGHUP) ? "SIGHUP" :
                        (WTERMSIG(status) == SIGTERM) ? "SIGTERM" : "an unknown signal");
            else
                fprintf(stderr, "Child process has exited, so the parent will too.\n");
            fflush(stderr);
            break;
        }
    }

    if (done) {
        fprintf(stderr, "Parent process (%ld): Exited due to %s.\n", (long)getpid(),
                   (done == SIGINT) ? "SIGINT" :
                   (done == SIGHUP) ? "SIGHUP" : "Unknown signal.\n");
        fflush(stderr);
    }

    /* Never reached! */
    return EXIT_SUCCESS;
}

Compile and run the above using e.g.

gcc -Wall -O2 example.c -o example
./example

The parent process will print to standard output, and the child process to standard error. The parent process will exit if you press Ctrl+C; the child process will ignore that signal. The child process uses SIGHUP instead of SIGIO (although a realtime signal, say SIGRTMIN+0, would be safer); if generated by the parent process exiting, the SIGHUP signal handler will raise SIGTERM in the child.

To make the termination causes easy to see, the child catches SIGTERM, and exits the next iteration (a second later). If so desired, the handler can use e.g. raise(SIGKILL) to terminate itself immediately.

Both parent and child processes show their process IDs, so you can easily send a SIGINT/SIGHUP/SIGTERM signal from another terminal window. (The child process ignores SIGINT and SIGHUP sent from outside the process.)



回答2:

I don't know this for sure, but clearing the parent death signal on execve when invoking a set-id binary looks like an intentional restriction for security reasons. I'm not sure why, considering that you can use kill to send signals to setuid programs that share your real user ID, but they wouldn't have bothered making that change in 2.6.23 if there wasn't a reason to disallow it.

Since you control the code of the set-id child, here is a kludge: make the call to prctl, then immediately afterward, call getppid and see if it returns 1. If it does, then either the process was started directly by init (which is not as rare as it used to be) or the process was reparented to init before it had a chance to call prctl, which means its original parent is dead and it should exit.

(This is a kludge because I know of no way to rule out the possibility that the process was started directly by init. init never exits, so you have one case where it should exit and one case where it shouldn't and no way to tell which. But if you know from the larger design that the process will not be started directly by init, it should be reliable.)

(You must call getppid after prctl, or you have only narrowed the race window, not eliminated it.)