Feature selection with caret rfe and training with

Right now, I'm trying to use Caret rfe function to perform the feature selection, because I'm in a situation with p>>n and most regression techniques that don't involve some sort of regularisation can't be used well. I already used a few techniques with regularisation (Lasso), but what I want to try now is reduce my number of feature so that I'm able to run, at least decently, any kind of regression algorithm on it.

control <- rfeControl(functions=rfFuncs, method="cv", number=5)
model <- rfe(trainX, trainY, rfeControl=control)
predict(model, testX)

Right now, if I do it like this, a feature selection algorithm using random forest will be run, and then the model with the best set of features, according to the 5-fold cross-validation, will be used for the prediction, right?

I'm curious about two things here: 1) Is there an easy way to take the set of feature, and train another function on it that the one used for the feature selection? For example, reducing the number of features from 500 to 20 or so that seem more important and then applying k-nearest neighborhood.

I'm imagining an easy way to do it that would look like that:

control <- rfeControl(functions=rfFuncs, method="cv", number=5)
model <- rfe(trainX, trainY, method = "knn", rfeControl=control)
predict(model, testX)

2) Is there a way to tune the parameters of the feature selection algorithm? I would like to have some control on the values of mtry. The same way you can pass a grid of value when you are using the train function from Caret. Is there a way to do such a thing with rfe?

标签： r r-caret feature-selection rfe

1条回答

爱情/是我丢掉的垃圾

2楼-- · 2020-06-29 05:31

Here is a short example on how to perform rfe with an inbuilt model:

library(caret)
library(mlbench) #for the data
data(Sonar)

rctrl1 <- rfeControl(method = "cv",
                     number = 3,
                     returnResamp = "all",
                     functions = caretFuncs,
                     saveDetails = TRUE)

model <- rfe(Class ~ ., data = Sonar,
             sizes = c(1, 5, 10, 15),
             method = "knn",
             trControl = trainControl(method = "cv",
                                      classProbs = TRUE),
             tuneGrid = data.frame(k = 1:10),
             rfeControl = rctrl1)

model
#output
Recursive feature selection

Outer resampling method: Cross-Validated (3 fold) 

Resampling performance over subset size:

 Variables Accuracy  Kappa AccuracySD KappaSD Selected
         1   0.6006 0.1984    0.06783 0.14047         
         5   0.7113 0.4160    0.04034 0.08261         
        10   0.7357 0.4638    0.01989 0.03967         
        15   0.7741 0.5417    0.05981 0.12001        *
        60   0.7696 0.5318    0.06405 0.13031         

The top 5 variables (out of 15):
   V11, V12, V10, V49, V9

model$fit$results
#output
    k  Accuracy     Kappa AccuracySD   KappaSD
1   1 0.8082684 0.6121666 0.07402575 0.1483508
2   2 0.8089610 0.6141450 0.10222599 0.2051025
3   3 0.8173377 0.6315411 0.07004865 0.1401424
4   4 0.7842208 0.5651094 0.08956707 0.1761045
5   5 0.7941775 0.5845479 0.07367886 0.1482536
6   6 0.7841775 0.5640338 0.06729946 0.1361090
7   7 0.7932468 0.5821317 0.07545889 0.1536220
8   8 0.7687229 0.5333385 0.05164023 0.1051902
9   9 0.7982468 0.5918922 0.07461116 0.1526814
10 10 0.8030087 0.6024680 0.06117471 0.1229467

for more customization see:

https://topepo.github.io/caret/recursive-feature-elimination.html

0人赞添加讨论(0) 举报

Feature selection with caret rfe and training with

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间