R- Random forest predict fails with NAs in predict

2019-07-27 12:04发布

问题:

The documentation (If I'm reading it correctly) says that the random forest predict function produces NA predictions if it encounters NA predictors for certain observations.

NOTE: If the object inherits from randomForest.formula, then any data with NA are silently omitted from the prediction. The returned value will contain NA correspondingly in the aggregated and individual tree predictions (if requested), but not in the proximity or node matrices

However, if I try to use the predict function on a dataset with some NA's in predictors [NA's in 7 observations out of 2688] I encounter the following error condition, and prediction fails.

Error in predict.randomForest(model, new.ds) : missing values in newdata

There is a slightly messy work-around that I would like to avoid if possible.

Am I doing/reading something wrong? Does it have to do something with the "inherits from randomForest.formula" clause?

回答1:

Using some examples from the documentation:

set.seed(1)
x <- data.frame(x1=gl(32, 5), x2=runif(160), y=rnorm(160))
rf1 <- randomForest(x[-3], x[[3]], ntree=10)
> inherits(rf1,"randomForest.formula")
[1] FALSE

> iris.rf <- randomForest(Species ~ ., data=iris, importance=TRUE,
                         proximity=TRUE)
> inherits(iris.rf,"randomForest.formula")
[1] TRUE

So you probably called randomForest without using the formula interface to fit your model.