Linux: system() + SIGCHLD handling + multi

2019-03-13 23:07发布

I have a multithreaded application that installs a handler for SIGCHLD that logs and reaps the child processes.
The problem I see starts when I'm doing a call to system(). system() needs to wait for the child process to end and reaps him itself since it needs the exit code. This is why it calls sigprocmask() to block SIGCHLD. But in my multithreaded application, the SIGCHLD is still called in a different thread and the child is reaped before system() has a chance to do so.

Is this a known problem in POSIX?

One way around this I thought of is to block SIGCHLD in all other threads but this is not really realistic in my case since not all threads are directly created by my code.
What other options do I have?

4条回答
倾城 Initia
2楼-- · 2019-03-13 23:46

For those who are still looking for the answer, there is an easier way to solve this problem:

Rewrite SIGCHLD handler to use waitid call with flags WNOHANG|WNOWAIT to check child's PID before reaping them. You can optionally check /proc/PID/stat (or similar OS interface) for command name.

查看更多
别忘想泡老子
3楼-- · 2019-03-13 23:47

Yes, it's a known (or at least strongly intimated) problem.

Blocking SIGCHLD while waiting for the child to terminate prevents the application from catching the signal and obtaining status from system()'s child process before system() can get the status itself. .... Note that if the application is catching SIGCHLD signals, it will receive such a signal before a successful system() call returns.

(From the documentation for system(), emphasis added.)

So, POSIXly you are out of luck, unless your implementation happens to queue SIGCHLD. If it does, you can of course keep a record of pids you forked, and then only reap the ones you were expecting.

Linuxly, too, you are out of luck, as signalfd appears also to collapse multiple SIGCHLDs.

UNIXly, however, you have lots of clever and too-clever techniques available to manage your own children and ignore those of third-party routines. I/O multiplexing of inherited pipes is one alternative to SIGCHLD catching, as is using a small, dedicated "spawn-helper" to do your forking and reaping in a separate process.

查看更多
虎瘦雄心在
4楼-- · 2019-03-13 23:47

Since you have threads you cannot control, I recommend you write a preloaded library to interpose the system() call (and perhaps also popen() etc.) with your own implementation. I'd also include your SIGCHLD handler in the library, too.

If you don't want to run your program via env LD_PRELOAD=libwhatever.so yourprogram, you can add something like

const char *libs;

libs = getenv("LD_PRELOAD");
if (!libs || !*libs) {
    setenv("LD_PRELOAD", "libwhatever.so", 1);
    execv(argv[0], argv);
    _exit(127);
}

at the start of your program, to have it re-execute itself with LD_PRELOAD appropriately set. (Note that there are quirks to consider if your program is setuid or setgid; see man ld.so for details. In particular, if libwhatever.so is not installed in a system library directory, you must specify a full path.)

One possible approach would be to use a lockless array (using atomic built-ins provided by the C compiler) of pending children. Instead of waitpid(), your system() implementation allocates one of the entries, sticks the child PID in there, and waits on a semaphore for the child to exit instead of calling waitpid().

Here is an example implementation:

#define  _GNU_SOURCE
#define  _POSIX_C_SOURCE 200809L
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <signal.h>
#include <semaphore.h>
#include <dlfcn.h>
#include <errno.h>

/* Maximum number of concurrent children waited for.
*/
#define  MAX_CHILDS  256

/* Lockless array of child processes waited for.
*/
static pid_t  child_pid[MAX_CHILDS] = { 0 }; /* 0 is not a valid PID */
static sem_t  child_sem[MAX_CHILDS];
static int    child_status[MAX_CHILDS];

/* Helper function: allocate a child process.
 * Returns the index, or -1 if all in use.
*/
static inline int child_get(const pid_t pid)
{
    int i = MAX_CHILDS;
    while (i-->0)
        if (__sync_bool_compare_and_swap(&child_pid[i], (pid_t)0, pid)) {
            sem_init(&child_sem[i], 0, 0);
            return i;
        }
    return -1;
}

/* Helper function: release a child descriptor.
*/
static inline void child_put(const int i)
{
    sem_destroy(&child_sem[i]);
    __sync_fetch_and_and(&child_pid[i], (pid_t)0);
}

/* SIGCHLD signal handler.
 * Note: Both waitpid() and sem_post() are async-signal safe.
*/
static void sigchld_handler(int signum __attribute__((unused)),
                            siginfo_t *info __attribute__((unused)),
                            void *context __attribute__((unused)))
{
    pid_t p;
    int   status, i;

    while (1) {
        p = waitpid((pid_t)-1, &status, WNOHANG);
        if (p == (pid_t)0 || p == (pid_t)-1)
            break;

        i = MAX_CHILDS;
        while (i-->0)
            if (p == __sync_fetch_and_or(&child_pid[i], (pid_t)0)) {
                child_status[i] = status;
                sem_post(&child_sem[i]);
                break;
            }

        /* Log p and status? */
    }
}

/* Helper function: close descriptor, without affecting errno.
*/
static inline int closefd(const int fd)
{
    int  result, saved_errno;

    if (fd == -1)
        return EINVAL;

    saved_errno = errno;

    do {
        result = close(fd);
    } while (result == -1 && errno == EINTR);
    if (result == -1)
        result = errno;
    else
        result = 0;

    errno = saved_errno;

    return result;
}

/* Helper function: Create a close-on-exec socket pair.
*/
static int commsocket(int fd[2])
{
    int  result;

    if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
        fd[0] = -1;
        fd[1] = -1;
        return errno;
    }

    do {
        result = fcntl(fd[0], F_SETFD, FD_CLOEXEC);
    } while (result == -1 && errno == EINTR);
    if (result == -1) {
        closefd(fd[0]);
        closefd(fd[1]);
        return errno;
    }

    do {
        result = fcntl(fd[1], F_SETFD, FD_CLOEXEC);
    } while (result == -1 && errno == EINTR);
    if (result == -1) {
        closefd(fd[0]);
        closefd(fd[1]);
        return errno;
    }

    return 0;
}

/* New system() implementation.
*/
int system(const char *command)
{
    pid_t   child;
    int     i, status, commfd[2];
    ssize_t n;

    /* Allocate the child process. */
    i = child_get((pid_t)-1);
    if (i < 0) {
        /* "fork failed" */
        errno = EAGAIN;
        return -1;
    }

    /* Create a close-on-exec socket pair. */
    if (commsocket(commfd)) {
        child_put(i);
        /* "fork failed" */
        errno = EAGAIN;
        return -1;
    }

    /* Create the child process. */
    child = fork();
    if (child == (pid_t)-1)
        return -1;

    /* Child process? */
    if (!child) {
        char *args[4] = { "sh", "-c", (char *)command, NULL };

        /* If command is NULL, return 7 if sh is available. */
        if (!command)
            args[2] = "exit 7";

        /* Close parent end of comms socket. */
        closefd(commfd[0]);

        /* Receive one char before continuing. */
        do {
            n = read(commfd[1], &status, 1);
        } while (n == (ssize_t)-1 && errno == EINTR);
        if (n != 1) {
            closefd(commfd[1]);
            _exit(127);
        }

        /* We won't receive anything else. */
        shutdown(commfd[1], SHUT_RD);

        /* Execute the command. If successful, this closes the comms socket. */
        execv("/bin/sh", args);

        /* Failed. Return the errno to the parent. */
        status = errno;
        {
            const char       *p = (const char *)&status;
            const char *const q = (const char *)&status + sizeof status;

            while (p < q) {
                n = write(commfd[1], p, (size_t)(q - p));
                if (n > (ssize_t)0)
                    p += n;
                else
                if (n != (ssize_t)-1)
                    break;
                else
                if (errno != EINTR)
                    break;
            }
        }

        /* Explicitly close the socket pair. */
        shutdown(commfd[1], SHUT_RDWR);
        closefd(commfd[1]);
        _exit(127);
    }

    /* Parent process. Close the child end of the comms socket. */
    closefd(commfd[1]);

    /* Update the child PID in the array. */
    __sync_bool_compare_and_swap(&child_pid[i], (pid_t)-1, child);

    /* Let the child proceed, by sending a char via the socket. */
    status = 0;
    do {
        n = write(commfd[0], &status, 1);
    } while (n == (ssize_t)-1 && errno == EINTR);
    if (n != 1) {
        /* Release the child entry. */
        child_put(i);
        closefd(commfd[0]);

        /* Kill the child. */
        kill(child, SIGKILL);

        /* "fork failed". */
        errno = EAGAIN;
        return -1;
    }

    /* Won't send anything else over the comms socket. */
    shutdown(commfd[0], SHUT_WR);

    /* Try reading an int from the comms socket. */
    {
        char       *p = (char *)&status;
        char *const q = (char *)&status + sizeof status;

        while (p < q) {
            n = read(commfd[0], p, (size_t)(q - p));
            if (n > (ssize_t)0)
                p += n;
            else
            if (n != (ssize_t)-1)
                break;
            else
            if (errno != EINTR)
                break;
        }

        /* Socket closed with nothing read? */
        if (n == (ssize_t)0 && p == (char *)&status)
            status = 0;
        else
        if (p != q)
            status = EAGAIN; /* Incomplete error code, use EAGAIN. */

        /* Close the comms socket. */
        shutdown(commfd[0], SHUT_RDWR);
        closefd(commfd[0]);
    }

    /* Wait for the command to complete. */
    sem_wait(&child_sem[i]);

    /* Did the command execution fail? */
    if (status) {
        child_put(i);
        errno = status;
        return -1;
    }

    /* Command was executed. Return the exit status. */
    status = child_status[i];
    child_put(i);

    /* If command is NULL, then the return value is nonzero
     * iff the exit status was 7. */
    if (!command) {
        if (WIFEXITED(status) && WEXITSTATUS(status) == 7)
            status = 1;
        else
            status = 0;
    }

    return status;
}

/* Library initialization.
 * Sets the sigchld handler,
 * makes sure pthread library is loaded, and
 * unsets the LD_PRELOAD environment variable.
*/
static void init(void) __attribute__((constructor));
static void init(void)
{
    struct sigaction  act;
    int               saved_errno;

    saved_errno = errno;

    sigemptyset(&act.sa_mask);
    act.sa_sigaction = sigchld_handler;
    act.sa_flags = SA_NOCLDSTOP | SA_RESTART | SA_SIGINFO;

    sigaction(SIGCHLD, &act, NULL);

    (void)dlopen("libpthread.so.0", RTLD_NOW | RTLD_GLOBAL);

    unsetenv("LD_PRELOAD");

    errno = saved_errno;
}

If you save the above as say child.c, you can compile it into libchild.so using

gcc -W -Wall -O3 -fpic -fPIC -c child.c -lpthread
gcc -W -Wall -O3 -shared -Wl,-soname,libchild.so child.o -ldl -lpthread -o libchild.so

If you have a test program that does system() calls in various threads, you can run it with system() interposed (and children automatically reaped) using

env LD_PRELOAD=/path/to/libchild.so test-program

Note that depending on exactly what those threads that are not under your control do, you may need to interpose further functions, including signal(), sigaction(), sigprocmask(), pthread_sigmask(), and so on, to make sure those threads do not change the disposition of your SIGCHLD handler (after installed by the libchild.so library).

If those out-of-control threads use popen(), you can interpose that (and pclose()) with very similar code to system() above, just split into two parts.

(If you are wondering why my system() code bothers to report the exec() failure to the parent process, it's because I normally use a variant of this code that takes the command as an array of strings; this way it correctly reports if the command was not found, or could not be executed due to insufficient privileges, etc. In this particular case the command is always /bin/sh. However, since the communications socket is needed anyway to avoid racing between child exit and having up-to-date PID in the *child_pid[]* array, I decided to leave the "extra" code in.)

查看更多
神经病院院长
5楼-- · 2019-03-14 00:03

Replace the system() by proc_system().

查看更多
登录 后发表回答