randomForest does not work when training set has m

2019-05-01 16:30发布

When trying to test my trained model on new test data that has fewer factor levels than my training data, predict() returns the following:

Type of predictors in new data do not match that of the training data.

My training data has a variable with 7 factor levels and my test data has that same variable with 6 factor levels (all 6 ARE in the training data).

When I add an observation containing the "missing" 7th factor, the model runs, so I'm not sure why this happens or even the logic behind it.

I could see if the test set had more/different factor levels, then randomForest would choke, but why in the case where training set has "more" data?

标签： r random-forest

1条回答

别忘想泡老子

2楼-- · 2019-05-01 17:03

R expects both the training and the test data to have the exact same levels (even if one of the sets has no observations for a given level or levels). In your case, since the test dataset is missing a level that the train has, you can do

test$val <- factor(test$val, levels=levels(train$val))

to make sure it has all the same levels and they are coded the same say.

(reposted here to close out the question)

0人赞添加讨论(0) 举报

randomForest does not work when training set has m

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间