mclapply returns NULL randomly

2019-01-23 15:02发布

问题:

When I am using mclapply, from time to time (really randomly) it gives incorrect results. The problem is quite thoroughly described in other posts across the Internet, e.g. (http://r.789695.n4.nabble.com/Bug-in-mclapply-td4652743.html). However, no solution is provided. Does anyone know how to fix this problem? Thank you!

回答1:

The problem reported by Winston Chang that you cite appears to have been fixed in R 2.15.3. There was a bug in mccollect that occurred when assigning the worker results to the result list:

if (is.raw(r)) res[[which(pid == pids)]] <- unserialize(r)

This fails if unserialize(r) returns a NULL, since assigning a NULL to a list in this way deletes the corresponding element of the list. This was changed in R 2.15.3 to:

if (is.raw(r)) # unserialize(r) might be null
    res[which(pid == pids)] <- list(unserialize(r))

which is a safe way to assign an unknown value to a list.

So if you're using R <= 2.15.2, the solution is to upgrade to R >= 2.15.3. If you have a problem using R >= 2.15.3, then presumably it's a different problem then the one reported by Winston Chang.


I also read over the issues discussed in the R-help thread started by Elizabeth Purdom. Without a specific test case, my guess is that the problem is not due to a bug in mclapply because I can reproduce the same symptoms with the following function:

work <- function(i, poison) {
  if (i == poison) quit(save='no')
  i
}

If a worker started by mclapply dies while executing a task for any reason (receiving a signal, seg faulting, exiting), mclapply will return a NULL for all of the tasks that were assigned to that worker:

> library(parallel)
> mclapply(1:4, work, 3, mc.cores=2)
[[1]]
NULL

[[2]]
[1] 2

[[3]]
NULL

[[4]]
[1] 4

In this case, NULL's were returned for tasks 1 and 3 due to prescheduling, even though only task 3 actually failed.

If a worker dies when using a function such as parLapply or clusterApply, an error is reported:

> cl <- makePSOCKcluster(3)
> parLapply(cl, 1:4, work, 3)
Error in unserialize(node$con) : error reading from connection

I've seen many such reports, and I think they tend to happen in large programs that use lots of packages that are hard to turn into reproducible test cases.

Of course, in this example, you'll also get an error when using lapply, although the error won't be hidden as it is with mclapply. If the problem doesn't seem to happen when using lapply, it may be because the problem rarely occurs, so it only happens in very large runs that are executed in parallel using mclapply. But it is also possible that the error occurs, not because the tasks are executed in parallel, but because they are executed by forked processes. For example, various graphics operations will fail when executed in a forked process.



回答2:

I'm adding this answer so others hitting this question won't have to wade through the long thread of comments (I am the bounty granter but not the OP).

mclapply initially populates the list it creates with NULLS. As the worker processes return values, these values overwrite the NULLS. If a process dies without ever returning a value, mclapply will return a NULL.

When memory becomes low, the Linux out of memory killer (oom killer)

https://lwn.net/Articles/317814/

will start silently killing processes. It does not print anything to the console to let you know what it's doing, although the oom killer activities show up in the system log. In this situation the output of mclapply will appear to have been randomly contaminated with NULLS.