可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I have a program spawning and communicating with CPU heavy, unstable processes, not created by me. If my app crashes or is killed by SIGKILL
, I want the subprocesses to get killed as well, so the user don´t have to track them down and kill them manually.
I know this topic has been covered before, but I have tried all methods described, and none of them seem to live up to survive the test.
I know it must be possible, since terminals do it all the time. If I run something in a terminal, and kill the terminal, the stuff always dies.
I have tried atexit
, double fork and ptys
. atexit
doesn't work for sigkill
; double fork doesn't work at all; and ptys
I have found no way to work with using python.
Today, I found out about prctl(PR_SET_PDEATHSIG, SIGKILL)
, which should be a way for child processes to order a kill on themselves, when their parent dies.
I tried to use it with popen
, but it seams to have no effect at all:
import ctypes, subprocess
libc = ctypes.CDLL('/lib/libc.so.6')
PR_SET_PDEATHSIG = 1; TERM = 15
implant_bomb = lambda: libc.prctl(PR_SET_PDEATHSIG, TERM)
subprocess.Popen(['gnuchess'], preexec_fn=implant_bomb)
In the above, the child is created and the parent exits. Now you would expect gnuchess
to receive a SIGKILL
and die, but it doesn't. I can still find it in my process manager using 100% CPU.
Can anybody tell me if there is something wrong with my use of prctl
?,
or do you know how terminals manage to kill their children?
回答1:
prctl's PR_SET_DEATHSIG
can only be set for this very process that's calling prctl -- not for any other process, including this specific process's children. The way the man page I'm pointing to expresses this is "This value is cleared upon a fork()" -- fork
, of course, is the way other processes are spawned (in Linux and any other Unix-y OS).
If you have no control over the code you want to run in subprocesses (as would be the case, essentially, for your gnuchess
example), I suggest you first spawn a separate small "monitor" process with the role of keeping track of all of its siblings (your parent process can let the monitor know about those siblings' pids as it spawns them) and sending them killer signals when the common parent dies (the monitor needs to poll for that, waking up every N seconds for some N of your choice to check if the parent's still alive; use select
to wait for more info from the parent with a timeout of N seconds, within a loop).
Not trivial, but then such system tasks often aren't. Terminals do it differently (via the concept of a "controlling terminal" for a process group) but of course it's trivial for any child to block THAT off (double forks, nohup
, and so on).
回答2:
I know it's been years, but I found a simple (slightly hacky) solution to this problem. From your parent process, wrapping all your calls around a very simple C program that calls prctl() and then exec() solves this problem on Linux. I call it "yeshup":
#include <linux/prctl.h>
#include <signal.h>
#include <unistd.h>
int main(int argc, char **argv) {
if(argc < 2)
return 1;
prctl(PR_SET_PDEATHSIG, SIGHUP, 0, 0, 0);
return execvp(argv[1], &argv[1]);
}
When spawning your child processes from Python (or any other language), you can run "yeshup gnuchess [argments]." You'll find that, when the parent process is killed, all your child processes (should) be given SIGHUP nicely.
This works because Linux will honor the call to prctl (not clear it) even after execvp is called (which effectively "transforms" the yeshup process into a gnuchess process, or whatever command you specify there), unlike fork().
回答3:
Actually I found that your original approach worked just fine for me - here's the exact example code I tested with which worked:
echoer.py
#!/bin/env python
import time
import sys
i = 0
try:
while True:
i += 1
print i
time.sleep(1)
except KeyboardInterrupt:
print "\nechoer caught KeyboardInterrupt"
exit(0)
parentProc.py
#!/bin/env python
import ctypes
import subprocess
import time
libc = ctypes.CDLL('/lib64/libc.so.6')
PR_SET_PDEATHSIG = 1
SIGINT = 2
SIGTERM = 15
def set_death_signal(signal):
libc.prctl(PR_SET_PDEATHSIG, signal)
def set_death_signal_int():
set_death_signal(SIGINT)
def set_death_signal_term():
set_death_signal(SIGTERM)
#subprocess.Popen(['./echoer.py'], preexec_fn=set_death_signal_term)
subprocess.Popen(['./echoer.py'], preexec_fn=set_death_signal_int)
time.sleep(1.5)
print "parentProc exiting..."
回答4:
I thought the double fork was to detach from a controlling terminal. I'm not sure how you are trying to use it.
It's a hack, but you could always call 'ps' and search for the process name your trying to kill.
回答5:
I've seen very nasty ways of "clean-up" using things like ps xuawww | grep myApp | awk '{ print $1}' | xargs -n1 kill -9
The client process, if popened, can catch SIG_PIPE and die. There are many ways to go about this, but it really depends on a lot of factors. If you throw some ping code (ping to parent) in the child, you can ensure that a SIG_PIPE is issued on death. If it catches it, which it should, it'll terminate. You'd need bidirectional communication for this to work correctly... or to always block against the client as the originator of communication. If you don't want to modify the child, ignore this.
Assuming that you don't expect the actual Python interpreter to segfault, you could add each PID to a sequence, and then kill on exit. This should be safe for exiting and even uncaught exceptions. Python has facilities to perform exit code... for clean-up.
Here's some safer nasty: Append each child PID to a file, including your master process (separate file). Use file locking. Build a watchdog daemon that looks at the flock() state of your master pid. If it's not locked, kill every PID in your child PID list. Run the same code on startup.
More nasty: Write the PIDs to files, as above, then invoke your app in a sub-shell: (./myMaster; ./killMyChildren)
回答6:
I'm wondering if the PR_SET_PDEATHSIG
flag is getting cleared, even though you set it after you fork
(and before exec
), so it seems from the docs like it shouldn't get cleared.
In order to test that theory, you could try the following: use the same code to run a subprocess that's written in C and basically just calls prctl(PR_GET_PDEATHSIG, &result)
and prints the result.
Another thing you might try: adding explicit zeros for arg3, arg4, and arg5 when you call prctl
. I.e.:
>>> implant_bomb = lambda: libc.prctl(PR_SET_PDEATHSIG, TERM, 0, 0, 0)
回答7:
There is some security restriction to take into account because if we call setuid after execv he child cannot receive signal. The complete list of this restrictions is here
good luck !
/Mohamed