I am trying to extract rules from a C50 model while parallel processing. This answer helped me to extract the rules from the model object. However as I need the models to be processed in parallel, I am using foreach. This seems to have a problem with the not exported function, as it does not see the data object. Here is some reproducible code:
library(foreach)
library(doMC)
registerDoMC(2)
j = c(1,2)
result = foreach(i = j) %dopar% {
library(C50)
d = iris
model <- C5.0(Species ~ ., data = d)
modParty <- C50:::as.party.C5.0(model)
return(modParty)
}
In this case it just calculates the model twice. In my real code d
is a always changing sample which is also generated in the foreach function.
My debugging showed that the miscellaneous line is modParty <- C50:::as.party.C5.0(model)
. It throws the error
Error in { : task 1 failed - "Object 'd' not found"
even if d
is for sure available for each worker in the cluster. I checked that with a log into a file via loginfo(ls())
of the logging
package.
Why does the function not see the object d
? Any help greatly appreciated.
As additional info here is the traceback()
> traceback()
3: stop(simpleError(msg, call = expr))
2: e$fun(obj, substitute(ex), parent.frame(), e$data)
1: foreach(i = j) %dopar% {
library(C50)
d = iris
model <- C5.0(Species ~ ., data = d)
modParty <- C50:::as.party.C5.0(model)
return(modParty)
}
Edit
Just for clarification: it doesn't have to do anything with foreach
. It is the same error with a normal function:
library(C50)
d = iris
getC50Party = function(dat){
model <- C5.0(Species ~ ., data = dat)
modParty <- C50:::as.party.C5.0(model)
return(modParty)
}
c50Party = getC50Party(d)
Error in { : task 1 failed - "Object 'dat' not found"
The problem is that as.party.C5.0
tries to access the data object from the overall workspace.