I am very new at machine learning and am attempting the forest cover prediction competition on Kaggle, but I am getting hung up pretty early on. I get the following error when I run the code below.
Error in train.default(x, y, weights = w, ...) :
final tuning parameters could not be determined
In addition: There were 50 or more warnings (use warnings() to see the first 50)
# Load the libraries
library(ggplot2); library(caret); library(AppliedPredictiveModeling)
library(pROC)
library(Amelia)
set.seed(1234)
# Load the forest cover dataset from the csv file
rawdata <- read.csv("train.csv",stringsAsFactors = F)
#this data won't be used in model evaluation. It will only be used for the submission.
test <- read.csv("test.csv",stringsAsFactors = F)
########################
### DATA PREPARATION ###
########################
#create a training and test set for building and evaluating the model
samples <- createDataPartition(rawdata$Cover_Type, p = 0.5,list = FALSE)
data.train <- rawdata[samples, ]
data.test <- rawdata[-samples, ]
model1 <- train(as.factor(Cover_Type) ~ Elevation + Aspect + Slope + Horizontal_Distance_To_Hydrology,
data = data.train,
method = "rf", prox = "TRUE")
The following should work:
model1 <- train(as.factor(Cover_Type) ~ Elevation + Aspect + Slope + Horizontal_Distance_To_Hydrology,
data = data.train,
method = "rf", tuneGrid = data.frame(mtry = 3))
Its always better to specify the tuneGrid
parameter which is a data frame with possible tuning values. Look at ?randomForest
and ?train
for more information. rf
has only one tuning parameter mtry
, which controls the number of features selected for each tree.
You can also run modelLookup
to get a list of tuning parameters for each model
> modelLookup("rf")
# model parameter label forReg forClass probModel
#1 rf mtry #Randomly Selected Predictors TRUE TRUE TRUE
I too am doing Kaggle competitions and have been using the 'caret' package to help with choosing the 'best' model parameters. After getting many of these errors I looked into the scripting behind the scenes and discovered a call to a function called 'class2ind' which does not exist (at least anywhere I know). I finally found another function called 'class.ind' which is in the 'nnet' package. I decided to just try and create a local function called 'class2ind' and pop in the code from the 'class.ind' function. And low and behold it worked!
# fix for caret
class2ind <- function(cl)
{
n <- length(cl)
cl <- as.factor(cl)
x <- matrix(0, n, length(levels(cl)) )
x[(1:n) + n*(unclass(cl)-1)] <- 1
dimnames(x) <- list(names(cl), levels(cl))
x
}