Foreach loop unable to find object

2020-02-14 06:29发布

问题:

I am trying to use foreach with the parallel backend to speed up computation (of cross validation of an {AUCRF} random forest for feature selection, if this does matter). In the process of doing so i need to get a subset of a vector. The name of the vector can change but is accessible as character vector. I used the eval(parse()) construct(good idea?) to get a subset of the vector.

Example:

library(parallel)
library(foreach)
library(stats)

#create cluster
clu <- makeCluster(detectCores() - 1)
registerDoParallel(clu, cores = detectCores() - 1)

bar<-c("a","b","c","d")
rab<-c(2,3)
bar.name<-"bar"

#expected output in this example is a list containing ten times
bar[rab]
#or
eval(parse(text=paste(bar.name,"[rab]",sep="")))

foo<-foreach(m = 1:10, .packages = c("stats")) %dopar% {
  sink("foreach.txt")
      print(bar.name)
      print(parse(text=paste(bar.name,"[rab]",sep="")))
      print(eval(parse(text=paste(bar.name,"[rab]",sep=""))))
  foo.temp<-eval(parse(text=paste(bar.name,"[rab]",sep="")))
  return(foo.temp)
}
sink()
stopCluster(clu)

However i get the following error:

Error in { : task 1 failed - "Object 'bar' not found"

I thought that each worker is getting a copy of the workspace with all objects. Any idea what I'm doing wrong?

回答1:

This sounds like a bad design. It's almost never necessary to use eval(parse()).

To get a variable, get() is somewhat safer, like get(bar.name)[rab]. But you're still running into an environment issue here. Since you don't have the variables bar or rab in the body of the dopar, they are not exported to the environment where foreach is running the code. You can fix that with an explicit assignment of the .export parameter of foreach to make sure those variables are exported. Here I change to use get and only have to explicitly export bar because rab is now include in the box of the function.

foo<-foreach(m = 1:10, .export=c("bar"), .packages = c("stats")) %dopar% {
  get(bar.name)[rab]
}

A better idea would be rather than specifying a variable name, specify an element of a named list. For example

baz <- list(bar=letters[1:4], bob=letters[5:7])

Then you could do

baz.name <- "bar"
rab <- c(2,4)

foo<-foreach(m = 1:10, .packages = c("stats")) %dopar% {
  baz[[baz.name]][rab]
}

And because dopar can see the variables baz, baz.name, and rab you don't have to export anything.