Creating folds for k-fold CV in R using Caret

I'm trying to make a k-fold CV for several classification methods/hiperparameters using the data available at

http://archive.ics.uci.edu/ml/machine-learning-databases/undocumented/connectionist-bench/sonar/sonar.all-data.

This set is made of 208 rows, each with 60 attributes. I'm reading it into a data.frame using the read.table function.

The next step is to split my data into k folds, let's say k = 5. My first attempt was to use

test <- createFolds(t, k=5)

I had two issues with this. The first one is that the lengths of the folds are not next to each other:

  Length Class  Mode   
Fold1 29 -none- numeric
Fold2 14 -none- numeric
Fold3 7 -none- numeric
Fold4 5 -none- numeric
Fold5 5 -none- numeric

The other one is that this apparently splitted my data according to the attributes indexes, but I want to split the data itself. I thought that by transposing my data.frame, using:

test <- t(myDataNumericValues)

But when I call the createFolds function, it gives me something like this:

  Length Class  Mode   
Fold1 2496 -none- numeric
Fold2 2496 -none- numeric
Fold3 2495 -none- numeric
Fold4 2496 -none- numeric
Fold5 2497 -none- numeric

The length issue was solved, but it's still not splitting my 208 data accordingly.

Any thoughts about what I can do? Do you think that the caret package is not the most appropriated?

Thanks in advance

标签： r cross-validation r-caret

2条回答

小情绪 Triste *

2楼-- · 2020-05-23 14:39

Please read ?createFolds to understand what the function does. It creates the indices that define which data are held out the separate folds (see the options to return the converse):

  > library(caret)
  > library(mlbench)
  > data(Sonar)
  > 
  > folds <- createFolds(Sonar$Class)
  > str(folds)
  List of 10
   $ Fold01: int [1:21] 25 39 58 63 69 73 80 85 90 95 ...
   $ Fold02: int [1:21] 19 21 42 48 52 66 72 81 88 89 ...
   $ Fold03: int [1:21] 4 5 17 34 35 47 54 68 86 100 ...
   $ Fold04: int [1:21] 2 6 22 29 32 40 60 65 67 92 ...
   $ Fold05: int [1:20] 3 14 36 41 45 75 78 84 94 104 ...
   $ Fold06: int [1:21] 10 11 24 33 43 46 50 55 56 97 ...
   $ Fold07: int [1:21] 1 7 8 20 23 28 31 44 71 76 ...
   $ Fold08: int [1:20] 16 18 26 27 38 57 77 79 91 99 ...
   $ Fold09: int [1:21] 13 15 30 37 49 53 74 83 93 96 ...
   $ Fold10: int [1:21] 9 12 51 59 61 62 64 70 82 87 ...

To use these to split the data:

   > split_up <- lapply(folds, function(ind, dat) dat[ind,], dat = Sonar)
   > dim(Sonar)
   [1] 208  61
   > unlist(lapply(split_up, nrow))
   Fold01 Fold02 Fold03 Fold04 Fold05 Fold06 Fold07 Fold08 Fold09 Fold10 
       21     21     21     21     20     21     21     20     21     21

The function train is used in this package to do the actual modeling (you don't usually need to do the splitting yourself. See this page).

Max

0人赞添加讨论(0) 举报

等我变得足够好

3楼-- · 2020-05-23 14:47

I'm not familiar with the caret package, but I used to write a function calculating CV based on decision tree from the rpart package. Of course, the function needs motifying in order to suit your purpose.

CV <- function(form, x, fold = 10, cp = 0.01) {
  # x is the data
  n <- nrow(x)
  prop <- n%/%fold
  set.seed(7)
  newseq <- rank(runif(n))
  k <- as.factor((newseq - 1)%/%prop + 1)

  y <- unlist(strsplit(as.character(form), " "))[2]
  vec.accuracy <- vector(length = fold)
  for (i in seq(fold)) {
    # It depends on which classification method you use
    fit <- rpart(form, data = x[k != i, ], method = "class")
    fit.prune <- prune(fit, cp = cp)
    fcast <- predict(fit.prune, newdata = x[k == i, ], type = "class")
    cm <- table(x[k == i, y], fcast)
    accuracy <- (cm[1, 1] + cm[2, 2])/sum(cm)
    vec.accuracy[i] <- accuracy
  }
avg.accuracy <- mean(vec.accuracy)
avg.error <- 1 - avg.accuracy
cv <- data.frame(Accuracy = avg.accuracy, Error = avg.error)
return(cv)

}

0人赞添加讨论(0) 举报

Creating folds for k-fold CV in R using Caret

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间