An error in one job contaminates others with mclap

2019-05-07 03:55发布

问题:

When mclapply(X, FUN) encounters errors for some of the values of X, the errors propagate to some (but not all) of the other values of X:

require(parallel)
test <- function(x) if(x == 3) stop() else x
mclapply(1:3, test, mc.cores = 2)

#[[1]]
#[1] "Error in FUN(c(1L, 3L)[[2L]], ...[cut]
#
#[[2]]
#[1] 2
#
#[[3]]
#[1] "Error in FUN(c(1L, 3L)[[2L]], ... [cut]

#Warning message:
#In mclapply(1:3, test, mc.cores = 2) :
#  scheduled core 1 encountered error in user code, all values of the job will be affected

How can I stop this happening?

回答1:

The trick is to set mc.preschedule = FALSE

mclapply(1:3, test, mc.cores = 2, mc.preschedule = FALSE)
#[[1]]
#[1] 1

#[[2]]
#[1] 2

#[[3]]
#[1] "Error in FUN(X[[nexti]], ...[cut]
#Warning message:
#In mclapply(1:3, test, mc.cores = 2, mc.preschedule = FALSE) :
#  1 function calls resulted in an error

This works because by default mclapply seems to divide X into mc.cores groups and applies a vectorized version of FUN to each group. As a result if any member of the group yields an error, all values in that group will yield the same error (but values in other groups are unaffected).

Setting mc.preschedule = FALSE has adverse effects and may make it impossible to reproduce a sequence of pseudo-random numbers where the same job always receives the same number in the sequence, see ?mcparallel under the heading Random numbers.



标签: r fork mclapply