Ensemble different datasets in R

I am trying to combine signals from different models using the example described here . I have different datasets which predicts the same output. However, when I combine the model output in caretList, and ensemble the signals, it gives an error

Error in check_bestpreds_resamples(modelLibrary) : 
  Component models do not have the same re-sampling strategies

Here is the reproducible example

library(caret)
library(caretEnsemble)
df1 <-
  data.frame(x1 = rnorm(200),
             x2 = rnorm(200),
             y = as.factor(sample(c("Jack", "Jill"), 200, replace = T)))

df2 <-
  data.frame(z1 = rnorm(400),
             z2 = rnorm(400),
             y = as.factor(sample(c("Jack", "Jill"), 400, replace = T)))

library(caret)
check_1 <- train( x = df1[,1:2],y = df1[,3],
                 method = "nnet",
                 tuneLength = 10,
                 trControl = trainControl(method = "cv",
                                          classProbs = TRUE,
                                          savePredictions = T))

check_2 <- train( x = df2[,1:2],y = df2[,3] ,
                 method = "nnet",
                 preProcess = c("center", "scale"),
                 tuneLength = 10,
                 trControl = trainControl(method = "cv",
                                          classProbs = TRUE,
                                          savePredictions = T))


combine <- c(check_1, check_2)
ens <- caretEnsemble(combine)

标签： r r-caret ensemble-learning

1条回答

Bombasti

2楼-- · 2019-08-22 06:28

First of all, you are trying to combine 2 models trained on different training data sets. That is not going to work. All ensemble models will need to be based on the same training set. You will have different sets of resamples in each trained model. Hence your current error.

Also building your models without using caretList is dangerous because you will have a big change of getting different resample strategies. You can control that a bit better by using the index in trainControl (see vignette).

If you use 1 dataset you can use the following code:

ctrl <- trainControl(method = "cv",
                     number = 5,
                     classProbs = TRUE,
                     savePredictions = "final")

set.seed(1324)
# will generate the following warning:
# indexes not defined in trControl.  Attempting to set them ourselves, so 
# each model in the ensemble will have the same resampling indexes.
models <- caretList(x = df1[,1:2],
                    y = df1[,3] ,
                    trControl = ctrl,
                    tuneList = list(
                      check_1 = caretModelSpec(method = "nnet", tuneLength = 10),
                      check_2 = caretModelSpec(method = "nnet", tuneLength = 10, preProcess = c("center", "scale"))
                    )) 


ens <- caretEnsemble(models)


A glm ensemble of 2 base models: nnet, nnet

Ensemble results:
Generalized Linear Model 

200 samples
  2 predictor
  2 classes: 'Jack', 'Jill' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 200, 200, 200, 200, 200, 200, ... 
Resampling results:

  Accuracy   Kappa     
  0.5249231  0.04164767

Also read this guide on different ensemble strategies.

0人赞添加讨论(0) 举报

Ensemble different datasets in R

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间