Recursively kill R process with children in linux

2019-03-14 05:44发布

问题:

I am looking for a general method to launch and then kill an R process, including possibly all forks or other processes that it invoked.

For example, a user runs a script like this:

library(multicore);
for(i in 1:3) parallel(foo <- "bar");
for(i in 1:3) system("sleep 300", wait=FALSE);
for(i in 1:3) system("sleep 300&");
q("no")

After the user quits the R session, the child processes are still running:

jeroen@jeroen-ubuntu:~$ ps -ef | grep R
jeroen    4469     1  0 16:38 pts/1    00:00:00 /usr/lib/R/bin/exec/R
jeroen    4470     1  0 16:38 pts/1    00:00:00 /usr/lib/R/bin/exec/R
jeroen    4471     1  0 16:38 pts/1    00:00:00 /usr/lib/R/bin/exec/R
jeroen    4502  4195  0 16:39 pts/1    00:00:00 grep --color=auto R
jeroen@jeroen-ubuntu:~$ ps -ef | grep "sleep"
jeroen    4473     1  0 16:38 pts/1    00:00:00 sleep 300
jeroen    4475     1  0 16:38 pts/1    00:00:00 sleep 300
jeroen    4477     1  0 16:38 pts/1    00:00:00 sleep 300
jeroen    4479     1  0 16:38 pts/1    00:00:00 sleep 300
jeroen    4481     1  0 16:38 pts/1    00:00:00 sleep 300
jeroen    4483     1  0 16:38 pts/1    00:00:00 sleep 300
jeroen    4504  4195  0 16:39 pts/1    00:00:00 grep --color=auto sleep

To make things worse, their their parent process id is 1 making it hard to identify them. Is there a method to run an R script in a way that allows me to recursively kill the process and its children at any time?

Edit: so I don't want to manually have to go in to search & kill processes. Also I don't want to kill all R processes, as there might be others that are doing fine. I need a method to kill a specific process and all of its children.

回答1:

This is mainly about the multicore part. Children are waiting for you to collect the results - see ?collect. Normally, you should never use parallel without a provision to clean up, typically in on.exit. multicore cleans up in high-level functions like mclapply, but if you use lower-level functions it is your responsibility to perform the cleanup (since multicore cannot know if you left the children running intentionally or not).

Your example is really bogus, because you don't even consider collecting results. But anyway, if that is really what you want, you'll have to do the cleanup at some point. For example, if you want to terminate all children on exit, you could define .Last like this:

 .Last <- function(...) {
     collect(wait=FALSE)
     all <- children()
     if (length(all)) {
         kill(all, SIGTERM)
         collect(all)
     }
 }

Again, the above is not a recommended way to deal with this - it rather a last resort. You should really assign jobs and collect results like

jobs <- lapply(1:3, function(i) parallel({Sys.sleep(i); i}))
collect(jobs)

As for the general process child question - init inherits the children only after R quits, but in .Last you can still find their pids since the parent process exists at that point so you could perform similar cleanup as in the multicore case.



回答2:

Before the user quits the R session, the processes you want to kill will have parent process ID equal to the process ID of the session that started them. You could perhaps use the .Last or .Last.sys hooks (see help(q)) to kill all processes with the appropriate PPID at that point; those can be suppressed with q(runLast=FALSE), so it isn't perfect, but I think it's the best option you have.

After the user quits the R session, there is no reliable way to do what you want -- the only record the kernel keeps of process parentage is the PPID you see in ps -ef, and when a parent process exits, that information is destroyed, as you have discovered.

Note that if one of the child processes forks, the grandchild will have PPID equal to the child's PID, and that will get reset to 1 when the child exits, which it might do before the grandparent exits. Thus, there is no reliable way to catch all of a process's descendants in general, even if you do so before the process exits. (One hears that "cgroups" provide a way, but one is unfamiliar with the details; in any case, that is an optional feature which only some iterations/configurations of the Linux kernel provide, and is not available at all elsewhere.)



回答3:

I believe the latter part of the question is more a consideration of the shell, rather than the kernel. (Simon Urbanek has answered the multicore part better than pretty much anyone else could, as he's the author. :))

If you're using bash, you can find the PID of the most recently launched child process in $!. You can aggregate the PIDs and then be sure to kill those off when you close R.

If you want to be really gonzo, you could store parent PID (i.e. the output of Sys.getpid()) and child PID in a file and have a cleaning daemon that checks whether or not the parent PID exists and, if not, kills the orphans. I don't think it'll be that easy to get a package called oRphanKilleR onto CRAN, though.

Here is an example of appending the child PID to a file:

system('(sleep 20) & echo $! >> ~/childPIDs.txt', wait = FALSE)

You can modify this to create your own shell command and use R's tempfile() command to create a temporary file (albeit, that will disappear when the R instance is terminated, unless you take a special effort to preserve the file via permissions).

For some other clever ideas, see this other post on SO.

You can also create a do while loop in the shell that will check for whether or not a particular PID is in existence. While it is, the loop sleeps. Once the loop terminates (because the PID is no longer in use), the script will kill another PID.

Basically, I think your solution will be in shell scripting, rather than R.