Error in `contrasts' Error

2020-04-10 02:45发布

问题:

I have trained a model and I am attempting to use the predict function but it returns the following error.

Error in contrasts<-(*tmp*, value = contr.funs[1 + isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels

There are several questions in SO and CrossValidated about this, and from what I interpret this error to be, is one factor in my model has only one level.

This is a pretty simple model, with one continuous variable (driveTime) and one factor variable which has 3 levels

 driveTime         Market.y      transfer
 Min.   : 5.100   Dallas :10   Min.   :-11.205  
 1st Qu.: 6.192   McAllen: 6   1st Qu.:  3.575  
 Median : 7.833   Tulsa  : 3   Median :  7.843  
 Mean   : 8.727                Mean   :  8.883  
 3rd Qu.:10.725                3rd Qu.: 15.608  
 Max.   :14.350                Max.   : 30.643

When I use the predict function to determine an outcome on an unseen sample

newDriveTime <- data.frame(driveTime =  8,Market.y = as.factor("Dallas"))
predict(bestMod_Rescaled, newDriveTime)

I get the following error

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

Here is more of my workflow

tc          <- tune.control(cross = 10, fix = 8/10)

    tuneResult_Rescaled <- tune(svm,data = finalSubset,
                                transfer~ driveTime + Market.y,
                                ranges = list(epsilon = seq(0.1,.5,0.1),
                                              cost = seq(8,10,.1)), tunecontrol=tc)

    summary(tuneResult_Rescaled)


    bestMod_Rescaled <- tuneResult_Rescaled$best.model

回答1:

I think you have to provide factor levels in the trainings data to the test set as well. Something like the following should work.

newDriveTime <- data.frame(driveTime =  8, 
                    Market.y = factor("Dallas", levels(finalSubset$Market.y)))

predict(bestMod_Rescaled, newDriveTime)

In R, factor are saved as integers with names / labels. If you have two factor vectors with different number of levels, just by looking at the labels, one can not be sure which labels are the corresponding levels in the two vectors.