caret
gives me the error below. I'm training a SVM for prediction starting from a bag of words and wanted to use caret to tune the C parameter, however:
bow.model.svm.tune <- train(Training.match ~ ., data = data.frame(
Training.match = factor(Training.Data.old$Training.match, labels = c('no match', 'match')),
Text.features.dtm.df) %>%
filter(Training.Data.old$Data.tipe == 'train'),
method = 'svmRadial',
tuneLength = 9,
preProc = c("center","scale"),
metric="ROC",
trControl = trainControl(
method="repeatedcv",
repeats = 5,
summaryFunction = twoClassSummary,
classProbs = T))
Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to no.match, match . Please use factor levels that can be used as valid R variable names (see ?make.names for help).
The original e1071::svm()
function doesn't give problems, therefore I suppose the error arise in the tuning phase:
bow.model.svm.tune <- svm(Training.match ~ ., data = data.frame(
Training.match = factor(Training.Data.old$Training.match, labels = c('no match', 'match')),
Text.features.dtm.df) %>%
filter(Training.Data.old$Data.tipe == 'train'))
The data is simply an outcome factor variable and list of TfIdf transformed words vectors:
'data.frame': 1796 obs. of 1697 variables:
$ Training.match : Factor w/ 2 levels "no match","match": 2 1 1 1 1 1 1 1 2 1 ...
$ azienda : num 0.12 0 0 0 0 ...
$ bus : num 0.487 0 0 0 0 ...
$ locale : num 0.275 0 0 0 0 ...
$ martini : num 0.852 0.741 0.947 0.947 0.501 ...
$ osp : num 0.339 0 0 0 0 ...
$ ospedale : num 0.0389 0.0676 0.0864 0.0864 0.0915 ...