R parallel computing and zombie processes

2019-03-13 07:07发布

问题:

This is basically a follow up to the this more specialized question. There have been some posts about the creation of zombie processes when doing parallel computing in R:

  1. How to stop R from leaving zombie processes behind
  2. How to kill a doMC worker when it's done?
  3. Remove zombie processes using parallel package

There are several ways of doing parallel computing and I will focus on the three ways that I have used so far on a local machine. I used doMC and doParallel with the foreachpackage on a local computer with 4cores:

(a) Registering a fork cluster:

library(doParallel)
cl <- makeForkCluster(4)
# equivalently here: cl <- makeForkCluster(nnodes=getOption("mc.cores", 4L))
registerDoParallel(cl)
    out <- foreach(i=1:1000, .combine = "c") %dopar% {
        print(i)
    }
stopCluster(cl)

(b) Registering a PSOCK cluster:

library(doParallel)
cl <- makePSOCKcluster(4)
registerDoParallel(cl)
    out <- foreach(i=1:1000, .combine = "c") %dopar% {
        print(i)
    }
stopCluster(cl)

(c) Using doMC

library(doMC)
library(doParallel)
registerDoMC(4)
    out <- foreach(i=1:1000, .combine = "c") %dopar% {
        print(i)
    }

Several users have observed that when using the doMC method -- which is just a wrapper for the mclapply function so its not doMCs fault (see here: How to kill a doMC worker when it's done?) -- leaves zombie processes behind. In an answer to a previous question (How to stop R from leaving zombie processes behind) it was suggested that using a fork cluster might not leave zombie processes behind. In another question it was suggested (Remove zombie processes using parallel package) that using a PSOCK cluster might not leave zombie processes behind. However, it seems that all three methods leave zombie process behind. While zombie processes per se are usually not a problem because they do (normally) not bind resources they clutter the process tree. Still I might get rid of them by closing and re-opening R but that is not the best option when I'm in the middle of a session. Is there an explanation why this happens (or even: is there a reason why this has to happen)? And is there something to be done so that no zombie processes are left behind?

My system info (R is used in a simple repl session with xterm and tmux):

library(devtools)
> session_info()
Session info-------------------------------------------------------------------
 setting  value                                             
 version  R Under development (unstable) (2014-08-16 r66404)
 system   x86_64, linux-gnu                                 
 ui       X11                                               
 language (EN)                                              
 collate  en_IE.UTF-8                                       
 tz       <NA>                                              

Packages-----------------------------------------------------------------------
 package    * version  source          
 codetools    0.2.8    CRAN (R 3.2.0)  
 devtools   * 1.5.0.99 Github (c429ae2)
 digest       0.6.4    CRAN (R 3.2.0)  
 doMC       * 1.3.3    CRAN (R 3.2.0)  
 evaluate     0.5.5    CRAN (R 3.2.0)  
 foreach    * 1.4.2    CRAN (R 3.2.0)  
 httr         0.4      CRAN (R 3.2.0)  
 iterators  * 1.0.7    CRAN (R 3.2.0)  
 memoise      0.2.1    CRAN (R 3.2.0)  
 RCurl        1.95.4.3 CRAN (R 3.2.0)  
 rstudioapi   0.1      CRAN (R 3.2.0)  
 stringr      0.6.2    CRAN (R 3.2.0)  
 whisker      0.3.2    CRAN (R 3.2.0)  

Small edit: At least for makeForkCluster() it seems that sometimes the forks it spawns get killed and reaped by the parent correctly and sometimes they do not get reaped and become zombies. It seems this only happens when the cluster is not closed fast enough after the loop is aborted or finished; at least that is when it happened the last few times.

回答1:

You could get rid of the zombie processes using the "inline" package. Just implement a function that calls "waitpid":

library(inline)
includes <- '#include <sys/wait.h>'
code <- 'int wstat; while (waitpid(-1, &wstat, WNOHANG) > 0) {};'
wait <- cfunction(body=code, includes=includes, convention='.C')

I tested this by first creating some zombies with the mclapply function:

> library(parallel)
> pids <- unlist(mclapply(1:4, function(i) Sys.getpid(), mc.cores=4))
> system(paste0('ps --pid=', paste(pids, collapse=',')))
  PID TTY          TIME CMD
17447 pts/4    00:00:00 R <defunct>
17448 pts/4    00:00:00 R <defunct>
17449 pts/4    00:00:00 R <defunct>
17450 pts/4    00:00:00 R <defunct>

(Note that I'm using the GNU version of "ps" which supports the "--pid" option.)

Then I called my "wait" function and called "ps" again to verify that the zombies are gone:

> wait()
list()
> system(paste0('ps --pid=', paste(pids, collapse=',')))
  PID TTY          TIME CMD

It appears that the worker processes created by mclapply are now gone. This should work as long as the processes were created by the current R process.