Caret Model random forest into PMML error

2020-04-12 10:25发布

问题:

I would like to export a Caret random forest model using the pmml library so I can use it for predictions in Java. Here is a reproduction of the error I am getting.

data(iris)
require(caret)
require(pmml)
rfGrid2 <- expand.grid(.mtry = c(1,2))
fitControl2 <- trainControl(
  method = "repeatedcv",
  number = NUMBER_OF_CV, 
  repeats = REPEATES)

model.Test <- train(Species ~ .,
  data = iris,
  method ="rf",
  trControl = fitControl2,
  ntree = NUMBER_OF_TREES,
  importance = TRUE,  
  tuneGrid = rfGrid2)

print(model.Test)
pmml(model.Test)

Error in UseMethod("pmml") : 
  no applicable method for 'pmml' applied to an object of class "c('train', 'train.formula')"

I was googling for a while, and found actually little info about exporting to PMML in general the pmml library has the randomforest in:

methods(pmml)
 [1] pmml.ada          pmml.coxph        pmml.cv.glmnet    pmml.glm          pmml.hclust       pmml.itemsets     pmml.kmeans      
 [8] pmml.ksvm         pmml.lm           pmml.multinom     pmml.naiveBayes   pmml.nnet         pmml.randomForest pmml.rfsrc       
[15] pmml.rpart        pmml.rules        pmml.svm 

It works using a direct randomforest model, but not the caret trained one.

library(randomForest)
iris.rf <- randomForest(Species ~ ., data=iris, ntree=20)
# Convert to pmml
pmml(iris.rf)
# this works!!!
str(iris.rf)

List of 19
 $ call           : language randomForest(formula = Species ~ ., data = iris, ntree = 20)
 $ type           : chr "classification"
 $ predicted      : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
...

str(model.Test)
List of 22
 $ method      : chr "rf"
 $ modelInfo   :List of 14
  ..$ label     : chr "Random Forest"
  ..$ library   : chr "randomForest"
  ..$ loop      : NULL
  ..$ type      : chr [1:2] "Classification" "Regression"
...

回答1:

You cannot invoke the pmml method with train or train.formula types (ie. this is the type of your model.Test object).

Caret documentation for the train method says that you can access the best model as the finalModel field. You can invoke the pmml method on that object then.

rf = model.Test$finalModel
pmml(rf)

Unfortunately, it turns out that Caret specifies the RF model using the "matrix interface" (ie. by setting the x and y fields), not using the more common "formula interface" (ie. by setting the formula field). AFAIK, the "pmml" package does not support the export of such RF models.

So, looks like your best option is to use a two-level approach. First, use the Caret package to find the most appropriate RF parametrization for your dataset. Second, train the final RF model manually using the "formula interface" with this parametrization.



回答2:

You can use the r2pmml package to do the job:

library("caret")
library("r2pmml")

data(iris)

train.rf = train(Species ~ ., data = iris, method = "rf")
print(train.rf)
r2pmml(train.rf, "/tmp/train-rf.pmml")