Cleaning up children processes asynchronously

2019-07-02 00:49发布

问题:

This is an example from <Advanced Linux Programming>, chapter 3.4.4. The programs fork() and exec() a child process. Instead of waiting for the termination of the process, I want the parent process to clean up the children process (otherwise the children process will become a zombie process) asynchronously. The can be done using the signal SIGCHLD. By setting up the signal_handler we can make the clean-up work done when the child process ends. And the code the following:

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <sys/wait.h>
#include <signal.h>
#include <string.h>

int spawn(char *program, char **arg_list){
    pid_t child_pid;

     child_pid = fork();
     if(child_pid == 0){    // it is the child process
        execvp(program, arg_list);
        fprintf(stderr, "A error occured in execvp\n");
        return 0;
     }
     else{
        return child_pid;
     }
}

int child_exit_status;

void clean_up_child_process (int signal_number){
    int status;
    wait(&status);
    child_exit_status = status;     // restore the exit status in a global variable
    printf("Cleaning child process is taken care of by SIGCHLD.\n");
};

int main()
{
    /* Handle SIGCHLD by calling clean_up_process; */
    struct sigaction sigchld_action;
    memset(&sigchld_action, 0, sizeof(sigchld_action));
    sigchld_action.sa_handler = &clean_up_child_process;
    sigaction(SIGCHLD, &sigchld_action, NULL);

    int child_status;
    char *arg_list[] = {    //deprecated conversion from string constant to char*
        "ls", 
        "-la",
        ".",
        NULL
    };

    spawn("ls", arg_list);

    return 0;
}

However, When I run the program in the terminal, the parent process never ends. And it seems that it doesn't execute the function clean_up_child_process (since it doesn't print out "Cleaning child process is taken care of by SIGCHLD."). What's the problem with this snippet of code?

回答1:

for GNU/Linux users

I already read this book. Although the book talked about this mechanism as a:

quote from 3.4.4 page 59 of the book:

A more elegant solution is to notify the parent process when a child terminates.

but it just said that you can use sigaction to handle this situation.


Here is a complete example of how to handle processes in this way.

First why do ever we use this mechanism? Well, since we do not want to synchronize all processes together.

real example
Imagine that you have 10 .mp4 files and you want to convert them to .mp3 files. Well, I junior user does this:

ffmpeg -i 01.mp4 01.mp3 

and repeats this command 10 times. A little higher users does this:

ls *.mp4 | xargs -I xxx ffmpeg -i xxx xxx.mp3

This time, this command pipes all 10 mp4 files per line, each one-by-one to xargs and then they one by one is converted to mp3.

But I senior user does this:

ls *.mp4 | xargs -I xxx -P 0 ffmpeg -i xxx xxx.mp3

and this means if I have 10 files, create 10 processes and run them simultaneously. And there is BIG different. In the two previous command we had only 1 process; it was created then terminated and then continued to another one. But with the help of -P 0 option, we create 10 processes at the same time and in fact 10 ffmpeg commands are running.


Now the purpose of cleaning up children asynchronously becomes cleaner. In fact we want to run some new processes but the order of those process and maybe the exit status of them is not matter for us. In this way we can run them as fast as possible and reduce the time.


First you can see man sigaction for any more details you want.

Second seeing this signal number by:

T ❱ kill -l | grep SIGCHLD
16) SIGSTKFLT   17) SIGCHLD     18) SIGCONT     19) SIGSTOP     20) SIGTSTP

sample code

objective: using the SIGCHLD to clean up child process

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <string.h>
#include <wait.h>
#include <unistd.h>

sig_atomic_t signal_counter;

void signal_handler( int signal_number )
{
    ++signal_counter;
    int wait_status;
    pid_t return_pid = wait( &wait_status );
    if( return_pid == -1 )
    {
        perror( "wait()" );
    }
    if( WIFEXITED( wait_status ) )
    {
        printf ( "job [ %d ] | pid: %d | exit status: %d\n",signal_counter, return_pid, WEXITSTATUS( wait_status ) );
    }
    else
    {
        printf( "exit abnormally\n" );
    }

    fprintf( stderr, "the signal %d was received\n", signal_number );
}

int main()
{
    // now instead of signal function we want to use sigaction
    struct sigaction siac;

    // zero it
    memset( &siac, 0, sizeof( struct sigaction ) );

    siac.sa_handler = signal_handler;
    sigaction( SIGCHLD, &siac, NULL );

    pid_t child_pid;

    ssize_t read_bytes = 0;
    size_t  length = 0;
    char*   line = NULL;

    char* sleep_argument[ 5 ] = { "3", "4", "5", "7", "9" };

    int counter = 0;

    while( counter <= 5 )
    {
        if( counter == 5 )
        {
            while( counter-- )
            {
                pause();
            }

            break;
        }

        child_pid = fork();

        // on failure fork() returns -1
        if( child_pid == -1 )
        {
            perror( "fork()" );
            exit( 1 );
        }

        // for child process fork() returns 0
        if( child_pid == 0 ){
            execlp( "sleep", "sleep", sleep_argument[ counter ], NULL );
        }

        ++counter;
    }

    fprintf( stderr, "signal counter %d\n", signal_counter );

    // the main return value
    return 0;
}

This is what the sample code does:

  1. create 5 child processes
  2. then goes to inner-while loop and pauses for receiving a signal. See man pause
  3. then when a child terminates, parent process wakes up and calls signal_handler function
  4. continue up to the last one: sleep 9

output: (17 means SIGCHLD)

ALP ❱ ./a.out 
job [ 1 ] | pid: 14864 | exit status: 0
the signal 17 was received
job [ 2 ] | pid: 14865 | exit status: 0
the signal 17 was received
job [ 3 ] | pid: 14866 | exit status: 0
the signal 17 was received
job [ 4 ] | pid: 14867 | exit status: 0
the signal 17 was received
job [ 5 ] | pid: 14868 | exit status: 0
the signal 17 was received
signal counter 5

when you run this sample code, on the other terminal try this:

ALP ❱ ps -o time,pid,ppid,cmd --forest -g $(pgrep -x bash)
    TIME   PID  PPID CMD
00:00:00  5204  2738 /bin/bash
00:00:00  2742  2738 /bin/bash
00:00:00  4696  2742  \_ redshift
00:00:00 14863  2742  \_ ./a.out
00:00:00 14864 14863      \_ sleep 3
00:00:00 14865 14863      \_ sleep 4
00:00:00 14866 14863      \_ sleep 5
00:00:00 14867 14863      \_ sleep 7
00:00:00 14868 14863      \_ sleep 9

As you can see a.out process has 5 children. And They are running simultaneously. Then whenever each of them terminates, kernel sends the signal SIGCHLD to their parent that is: a.out

NOTE

If we do not use pause or any mechanism so that the parent can wait for its children, then we will abandon the created processes and the upstart (= on Ubuntu or init) becomes parent of them. You can try it if you remove pause()



回答2:

The parent process immediately returns from main() after the child pid is returned from fork(), it never has the opportunity to wait for the child to terminate.



回答3:

I'm using Mac, so my answer may be not quite relevant, but still. I compile without any options, so executable name is a.out.

I have the same experience with the console (the process doesn't seem to terminate), but I noticed that it's just terminal glitch, because you actually can just press Enter and your command line will be back, and actually ps executed from other terminal window doesn't show a.out, nor ls which it launched.

Also if I run ./a.out >/dev/null it finishes immediately.

So the point of the above is that everything actually terminates, just the terminal freezes for some reason.

Next, why it never prints Cleaning child process is taken care of by SIGCHLD.. Simply because the parent process terminates before child. The SIGCHLD signal can't be delivered to already terminated process, so the handler is never invoked.

In the book it's said that the parent process contiunes to do some other things, and if it really does then everything works fine, for example if you add sleep(1) after spawn().