Error in train.default(x, y, weights = w, …) : fin

2019-02-11 01:08发布

I am very new at machine learning and am attempting the forest cover prediction competition on Kaggle, but I am getting hung up pretty early on. I get the following error when I run the code below.

Error in train.default(x, y, weights = w, ...) : 
final tuning parameters could not be determined
In addition: There were 50 or more warnings (use warnings() to see the first 50)
# Load the libraries
library(ggplot2); library(caret); library(AppliedPredictiveModeling)
library(pROC)
library(Amelia)

set.seed(1234)

# Load the forest cover dataset from the csv file
rawdata <- read.csv("train.csv",stringsAsFactors = F)
#this data won't be used in model evaluation. It will only be used for the submission.
test <- read.csv("test.csv",stringsAsFactors = F)

########################
### DATA PREPARATION ###
########################

#create a training and test set for building and evaluating the model
samples <- createDataPartition(rawdata$Cover_Type, p = 0.5,list = FALSE)
data.train <- rawdata[samples, ]
data.test <- rawdata[-samples, ]

model1 <- train(as.factor(Cover_Type) ~ Elevation + Aspect + Slope + Horizontal_Distance_To_Hydrology, 
                data = data.train, 
                method = "rf", prox = "TRUE")

2条回答
Explosion°爆炸
2楼-- · 2019-02-11 01:40

The following should work:

model1 <- train(as.factor(Cover_Type) ~ Elevation + Aspect + Slope + Horizontal_Distance_To_Hydrology,
                          data = data.train,
                          method = "rf", tuneGrid = data.frame(mtry = 3))

Its always better to specify the tuneGrid parameter which is a data frame with possible tuning values. Look at ?randomForest and ?train for more information. rf has only one tuning parameter mtry, which controls the number of features selected for each tree.

You can also run modelLookup to get a list of tuning parameters for each model

> modelLookup("rf")
#  model parameter                         label forReg forClass probModel
#1    rf      mtry #Randomly Selected Predictors   TRUE     TRUE      TRUE
查看更多
萌系小妹纸
3楼-- · 2019-02-11 01:50

I too am doing Kaggle competitions and have been using the 'caret' package to help with choosing the 'best' model parameters. After getting many of these errors I looked into the scripting behind the scenes and discovered a call to a function called 'class2ind' which does not exist (at least anywhere I know). I finally found another function called 'class.ind' which is in the 'nnet' package. I decided to just try and create a local function called 'class2ind' and pop in the code from the 'class.ind' function. And low and behold it worked!

# fix for caret
class2ind <- function(cl)
{
        n <- length(cl)
        cl <- as.factor(cl)
        x <- matrix(0, n, length(levels(cl)) )
        x[(1:n) + n*(unclass(cl)-1)] <- 1
        dimnames(x) <- list(names(cl), levels(cl))
        x
}
查看更多
登录 后发表回答