R foreach parallel processing with unexported func

2019-09-16 13:43发布

问题:

I am trying to extract rules from a C50 model while parallel processing. This answer helped me to extract the rules from the model object. However as I need the models to be processed in parallel, I am using foreach. This seems to have a problem with the not exported function, as it does not see the data object. Here is some reproducible code:

library(foreach)
library(doMC)
registerDoMC(2)

j = c(1,2)
result = foreach(i = j) %dopar% {
  library(C50)
  d = iris
  model <- C5.0(Species ~ ., data = d)
  modParty <- C50:::as.party.C5.0(model)
  return(modParty)
}

In this case it just calculates the model twice. In my real code d is a always changing sample which is also generated in the foreach function.

My debugging showed that the miscellaneous line is modParty <- C50:::as.party.C5.0(model). It throws the error

Error in { : task 1 failed - "Object 'd' not found"

even if d is for sure available for each worker in the cluster. I checked that with a log into a file via loginfo(ls()) of the logging package.

Why does the function not see the object d? Any help greatly appreciated.

As additional info here is the traceback()

> traceback()
3: stop(simpleError(msg, call = expr))
2: e$fun(obj, substitute(ex), parent.frame(), e$data)
1: foreach(i = j) %dopar% {
       library(C50)
       d = iris
       model <- C5.0(Species ~ ., data = d)
       modParty <- C50:::as.party.C5.0(model)
       return(modParty)
   }

Edit

Just for clarification: it doesn't have to do anything with foreach. It is the same error with a normal function:

library(C50)

d = iris

getC50Party = function(dat){
  model <- C5.0(Species ~ ., data = dat)
  modParty <- C50:::as.party.C5.0(model)
  return(modParty)
}

c50Party = getC50Party(d)

Error in { : task 1 failed - "Object 'dat' not found"

The problem is that as.party.C5.0 tries to access the data object from the overall workspace.

回答1:

This is a bug. We do follow Achim's advice and use the terms object except when we get the case wrong.

Try installing from github via

devtools::install_github("topepo/C5.0/pkg/C50")

Your examples works on this version.