Caret objecting to outcomes labels: Error: At leas

2019-09-15 19:45发布

问题:

caret gives me the error below. I'm training a SVM for prediction starting from a bag of words and wanted to use caret to tune the C parameter, however:

bow.model.svm.tune <- train(Training.match ~ ., data = data.frame(
    Training.match = factor(Training.Data.old$Training.match, labels = c('no match', 'match')),
    Text.features.dtm.df) %>%
        filter(Training.Data.old$Data.tipe == 'train'),
    method = 'svmRadial',
    tuneLength = 9,
    preProc = c("center","scale"),
    metric="ROC",
    trControl = trainControl(
        method="repeatedcv",
        repeats = 5,
        summaryFunction = twoClassSummary,
        classProbs = T))    

Error: At least one of the class levels is not a valid R variable name; This will cause errors when class probabilities are generated because the variables names will be converted to no.match, match . Please use factor levels that can be used as valid R variable names (see ?make.names for help).

The original e1071::svm() function doesn't give problems, therefore I suppose the error arise in the tuning phase:

bow.model.svm.tune <- svm(Training.match ~ ., data = data.frame(
             Training.match = factor(Training.Data.old$Training.match, labels = c('no match', 'match')),
             Text.features.dtm.df) %>%
                 filter(Training.Data.old$Data.tipe == 'train'))

The data is simply an outcome factor variable and list of TfIdf transformed words vectors:

'data.frame':   1796 obs. of  1697 variables:
 $ Training.match          : Factor w/ 2 levels "no match","match": 2 1 1 1 1 1 1 1 2 1 ...
 $ azienda                 : num  0.12 0 0 0 0 ...
 $ bus                     : num  0.487 0 0 0 0 ...
 $ locale                  : num  0.275 0 0 0 0 ...
 $ martini                 : num  0.852 0.741 0.947 0.947 0.501 ...
 $ osp                     : num  0.339 0 0 0 0 ...
 $ ospedale                : num  0.0389 0.0676 0.0864 0.0864 0.0915 ...

回答1:

When predicting (internally using train or using predict.train yourself), the functions make new columns for each class probability. If your code expects a column called "no match" it won't see "no.match" (which is what data.frame converts it to) and will throw an error.