I would like to export a Caret random forest model using the pmml library so I can use it for predictions in Java. Here is a reproduction of the error I am getting.
data(iris)
require(caret)
require(pmml)
rfGrid2 <- expand.grid(.mtry = c(1,2))
fitControl2 <- trainControl(
method = "repeatedcv",
number = NUMBER_OF_CV,
repeats = REPEATES)
model.Test <- train(Species ~ .,
data = iris,
method ="rf",
trControl = fitControl2,
ntree = NUMBER_OF_TREES,
importance = TRUE,
tuneGrid = rfGrid2)
print(model.Test)
pmml(model.Test)
Error in UseMethod("pmml") :
no applicable method for 'pmml' applied to an object of class "c('train', 'train.formula')"
I was googling for a while, and found actually little info about exporting to PMML in general the pmml library has the randomforest in:
methods(pmml)
[1] pmml.ada pmml.coxph pmml.cv.glmnet pmml.glm pmml.hclust pmml.itemsets pmml.kmeans
[8] pmml.ksvm pmml.lm pmml.multinom pmml.naiveBayes pmml.nnet pmml.randomForest pmml.rfsrc
[15] pmml.rpart pmml.rules pmml.svm
It works using a direct randomforest model, but not the caret trained one.
library(randomForest)
iris.rf <- randomForest(Species ~ ., data=iris, ntree=20)
# Convert to pmml
pmml(iris.rf)
# this works!!!
str(iris.rf)
List of 19
$ call : language randomForest(formula = Species ~ ., data = iris, ntree = 20)
$ type : chr "classification"
$ predicted : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
...
str(model.Test)
List of 22
$ method : chr "rf"
$ modelInfo :List of 14
..$ label : chr "Random Forest"
..$ library : chr "randomForest"
..$ loop : NULL
..$ type : chr [1:2] "Classification" "Regression"
...
You cannot invoke the
pmml
method withtrain
ortrain.formula
types (ie. this is the type of yourmodel.Test
object).Caret documentation for the
train
method says that you can access the best model as thefinalModel
field. You can invoke thepmml
method on that object then.Unfortunately, it turns out that Caret specifies the RF model using the "matrix interface" (ie. by setting the
x
andy
fields), not using the more common "formula interface" (ie. by setting theformula
field). AFAIK, the "pmml" package does not support the export of such RF models.So, looks like your best option is to use a two-level approach. First, use the Caret package to find the most appropriate RF parametrization for your dataset. Second, train the final RF model manually using the "formula interface" with this parametrization.
You can use the
r2pmml
package to do the job: