Passing an entire package to a snow cluster

2019-04-07 07:03发布

问题:

I'm trying to parallelize (using snow::parLapply) some code that depends on a package (ie, a package other than snow). Objects referenced in the function called by parLapply must be explicitly passed to the cluster using clusterExport. Is there any way to pass an entire package to the cluster rather than having to explicitly name every function (including a package's internal functions called by user functions!) in clusterExport?

回答1:

Install the package on all nodes, and have your code call library(thePackageYouUse) on all nodes via one the available commands, egg something like

 clusterApply(cl, library(thePackageYouUse))

I think the parallel package which comes with recent R releases has examples -- see for example here from help(clusterApply) where the boot package is loaded everywhere:

 ## A bootstrapping example, which can be done in many ways:
 clusterEvalQ(cl, {
   ## set up each worker.  Could also use clusterExport()
   library(boot)
   cd4.rg <- function(data, mle) MASS::mvrnorm(nrow(data), mle$m, mle$v)
   cd4.mle <- list(m = colMeans(cd4), v = var(cd4))
   NULL
 })