I have a multithreaded application that installs a handler for SIGCHLD that logs and reaps the child processes.
The problem I see starts when I'm doing a call to system()
. system()
needs to wait for the child process to end and reaps him itself since it needs the exit code. This is why it calls sigprocmask()
to block SIGCHLD. But in my multithreaded application, the SIGCHLD is still called in a different thread and the child is reaped before system()
has a chance to do so.
Is this a known problem in POSIX?
One way around this I thought of is to block SIGCHLD in all other threads but this is not really realistic in my case since not all threads are directly created by my code.
What other options do I have?
For those who are still looking for the answer, there is an easier way to solve this problem:
Rewrite SIGCHLD handler to use waitid call with flags WNOHANG|WNOWAIT to check child's PID before reaping them. You can optionally check /proc/PID/stat (or similar OS interface) for command name.
Yes, it's a known (or at least strongly intimated) problem.
(From the documentation for
system()
, emphasis added.)So, POSIXly you are out of luck, unless your implementation happens to queue SIGCHLD. If it does, you can of course keep a record of pids you forked, and then only reap the ones you were expecting.
Linuxly, too, you are out of luck, as signalfd appears also to collapse multiple SIGCHLDs.
UNIXly, however, you have lots of clever and too-clever techniques available to manage your own children and ignore those of third-party routines. I/O multiplexing of inherited pipes is one alternative to SIGCHLD catching, as is using a small, dedicated "spawn-helper" to do your forking and reaping in a separate process.
Since you have threads you cannot control, I recommend you write a preloaded library to interpose the
system()
call (and perhaps alsopopen()
etc.) with your own implementation. I'd also include yourSIGCHLD
handler in the library, too.If you don't want to run your program via
env LD_PRELOAD=libwhatever.so yourprogram
, you can add something likeat the start of your program, to have it re-execute itself with LD_PRELOAD appropriately set. (Note that there are quirks to consider if your program is setuid or setgid; see
man ld.so
for details. In particular, iflibwhatever.so
is not installed in a system library directory, you must specify a full path.)One possible approach would be to use a lockless array (using atomic built-ins provided by the C compiler) of pending children. Instead of
waitpid()
, yoursystem()
implementation allocates one of the entries, sticks the child PID in there, and waits on a semaphore for the child to exit instead of callingwaitpid()
.Here is an example implementation:
If you save the above as say
child.c
, you can compile it intolibchild.so
usingIf you have a test program that does
system()
calls in various threads, you can run it withsystem()
interposed (and children automatically reaped) usingNote that depending on exactly what those threads that are not under your control do, you may need to interpose further functions, including
signal()
,sigaction()
,sigprocmask()
,pthread_sigmask()
, and so on, to make sure those threads do not change the disposition of yourSIGCHLD
handler (after installed by thelibchild.so
library).If those out-of-control threads use
popen()
, you can interpose that (andpclose()
) with very similar code tosystem()
above, just split into two parts.(If you are wondering why my
system()
code bothers to report theexec()
failure to the parent process, it's because I normally use a variant of this code that takes the command as an array of strings; this way it correctly reports if the command was not found, or could not be executed due to insufficient privileges, etc. In this particular case the command is always/bin/sh
. However, since the communications socket is needed anyway to avoid racing between child exit and having up-to-date PID in the *child_pid[]* array, I decided to leave the "extra" code in.)Replace the
system()
byproc_system()
.