R's caret training errors when y is not a fact

2020-04-16 04:14发布

问题:

I am using R-studio and am using kaggle's forest cover data and keep getting an error when trying to use the knn3 function in caret. here is my code:

library(caret)
train <- read.csv("C:/data/forest_cover/train.csv", header=T)
trainingRows <- createDataPartition(train$Cover_Type, p=0.8, list=F)
head(trainingRows)
train_train <- train[trainingRows,]
train_test <- train[-trainingRows,]

knnfit <- knn3(train_train[,-56], train_train$Cover_Type)

This last line gives me this in the console:

Error in knn3.matrix(x, y = y, k = k, ...) : y must be a factor

回答1:

As the error message states, y must be a factor (here, y is the name of the second parameter to the function). In R, a factor variable is used to represent categorical data. You can turn y into a factor with factor(y) but it will just have the levels 1:7 for your data. If you want to give more meaningful values to your factor, try

train$Cover_Type <- factor(train$Cover_Type, levels=1:7, 
    labels=c("Spruce/Fir","Lodgepole Pine","Ponderosa Pine",
    "Cottonwood/Willow","Aspen",
    "Douglas-fir","Krummholz"))

That will make your function happier and give you more useful labels in the results