Random forest package in R shows error during pred

2019-01-24 16:51发布

I have 30 factor levels of a predictor in my training data. I again have 30 factor levels of the same predictor in my test data but some levels are different. And randomForest does not predict unless the levels are same exactly. It shows error. Says, Error in predict.randomForest(model,test) New factor levels not present in the training data

标签： r random-forest

3条回答

Emotional °昔

2楼-- · 2019-01-24 17:38

Use this to make the levels match (here test and train refer to columns in the testing and training datasets)

test<-factor(test, levels=levels(train))

0人赞添加讨论(0) 举报

来，给爷笑一个

3楼-- · 2019-01-24 17:40

One workaround I've found is to first convert the factor variables in your train and test sets into characters

test$factor <- as.character(test$factor)

Then add a column to each with a flag for test/train, i.e.

test$isTest <- rep(1,nrow(test))
train$isTest <- rep(0,nrow(train))

Then rbind them

fullSet <- rbind(test,train)

Then convert back to a factor

fullSet$factor <- as.factor(fullSet$factor)

This will ensure that both the test and train sets have the same levels. Then you can split back off:

test.new <- fullSet[fullSet$isTest==1,]
train.new <- fullSet[fullSet$isTest==0,]

and you can drop/NULL out the isTest column from each. Then you'll have sets with identical levels you can train and test on. There might be a more elegant solution, but this has worked for me in the past and you can write it into a little function if you need to repeat it often.

0人赞添加讨论(0) 举报

小情绪 Triste *

4楼-- · 2019-01-24 17:40

simple solution to this would be rbind your test data with training data ,do prediction and subset the rbind data from predictions .Tested method

0人赞添加讨论(0) 举报

Random forest package in R shows error during pred

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间