I'm trying to run a randomForest on a large-ish data set (5000x300). Unfortunately I'm getting an error message as follows:
> RF <- randomForest(prePrior1, postPrior1[,6]
+ ,,do.trace=TRUE,importance=TRUE,ntree=100,,forest=TRUE)
Error in randomForest.default(prePrior1, postPrior1[, 6], , do.trace = TRUE, :
NA/NaN/Inf in foreign function call (arg 1)
So I try to find any NA's using :
> df2 <- prePrior1[is.na(prePrior1)]
> df2
character(0)
> df2 <- postPrior1[is.na(postPrior1[,6])]
> df2
numeric(0)
which leads me to believe that it's Inf's that are the problem as there don't seem to be any NA's.
Any suggestions for how to root out Inf's?
You're probably looking for
is.finite
, though I'm not 100% certain that the problem is Infs in your input data.Be sure to read the help for
is.finite
carefully about which combinations of missing, infinite, etc. it picks out. Specifically, this:One of these things is not like the others. Not surprisingly, there's an
is.nan
function as well.In analogy to
is.na
, you can useis.infinite
to find occurrences of infinites.Take a look at
with
, e.g.:joran's answer is what you want and informative. For more details about
is.na()
andis.infinite()
, you should check out https://stat.ethz.ch/R-manual/R-devel/library/Matrix/html/is.na-methods.html and besides, after you get the logical vector which says whether each element of the original vector is NA/Inf, you can use thewhich()
function to get the indices, just like this:the document for
which()
is here https://stat.ethz.ch/R-manual/R-devel/library/base/html/any.htmlrandomForest's 'NA/NaN/Inf in foreign function call' is often a false warning, and really irritating:
Fast and dirty trick to narrow things down, do a binary-search on your variable list, and use token parameters like
ntree=2
to get an instant pass/fail on the subset of variables: