I am using R-studio and am using kaggle's forest cover data and keep getting an error when trying to use the knn3 function in caret. here is my code:
library(caret)
train <- read.csv("C:/data/forest_cover/train.csv", header=T)
trainingRows <- createDataPartition(train$Cover_Type, p=0.8, list=F)
head(trainingRows)
train_train <- train[trainingRows,]
train_test <- train[-trainingRows,]
knnfit <- knn3(train_train[,-56], train_train$Cover_Type)
This last line gives me this in the console:
Error in knn3.matrix(x, y = y, k = k, ...) : y must be a factor
As the error message states, y
must be a factor (here, y
is the name of the second parameter to the function). In R, a factor variable is used to represent categorical data. You can turn y
into a factor with factor(y)
but it will just have the levels 1:7
for your data. If you want to give more meaningful values to your factor, try
train$Cover_Type <- factor(train$Cover_Type, levels=1:7,
labels=c("Spruce/Fir","Lodgepole Pine","Ponderosa Pine",
"Cottonwood/Willow","Aspen",
"Douglas-fir","Krummholz"))
That will make your function happier and give you more useful labels in the results