randomForest Error: NA not permitted in predictors

2019-09-09 02:52发布

问题:

So I am attempting to run the 'genie3' algorithm (ref: http://homepages.inf.ed.ac.uk/vhuynht/software.html) in R which uses the 'randomForest' method.

I am running into the following Error:

> weight.matrix<-get.weight.matrix(tmpLog2FC, input.idx=1:4551)
Starting RF computations with 1000 trees/target gene,
and 67 candidate input genes/tree node
Computing gene 1/11805
Show Traceback

Rerun with Debug
Error in randomForest.default(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE,  : 
NA not permitted in predictors 

So I checked if NAs are present in my data, and there are none:

> NAs<-sapply(tmpLog2FC, function(x) sum(is.na(x)))
> length(which(NAs!=0))
[1] 0

I then tried editing the specific 'get.weight.matrix()' function to omit NAs (just in case) by changing this line:

rf <- randomForest(x, y, mtry=mtry, ntree=nb.trees, importance=TRUE, ...)

To:

rf <- randomForest(x, y, mtry=mtry, ntree=nb.trees, importance=TRUE, na.action=na.omit)

I then sourced the code, and double checked that it incorporated the changes by calling it on its own (and displaying the actual script):

    }
    target.gene.name <- gene.names[target.gene.idx]
    # remove target gene from input genes
    these.input.gene.names <- setdiff(input.gene.names, target.gene.name)
    x <- expr.matrix[,these.input.gene.names]
    y <- expr.matrix[,target.gene.name]
    rf <- randomForest(x, y, mtry=mtry, ntree=nb.trees, importance=TRUE, na.action=na.omit)

However when attempting to re-run, I get the same error:

Error in randomForest.default(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE,  : 
NA not permitted in predictors 

Has anyone encountered anything similar to this? Any ideas on what I can do?

Thanks in advance.

*EDIT: As suggested, I re-ran with debug:

> weight.matrix<-get.weight.matrix(tmpLog2FC, input.idx=1:4551)
Starting RF computations with 1000 trees/target gene,
and 67 candidate input genes/tree node
Computing gene 1/11805
Error in randomForest.default(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE,  : 
NA not permitted in predictors
Called from: randomForest(x, y, mtry = mtry, ntree = nb.trees, importance = TRUE, 
na.action = na.omit)
Browse[1]> 
> 

The debug shows that the line that I suspected is throwing the error, but it displays it in the edited form with 'na.action=na.omit'. I am even more confused. How can a dataset that has no NAs, run with a code that allows for NAs to be omitted, display this error?

回答1:

You can use the following command to find out the list of rows in which if any predictor will have no value it will be displayed.

data[!complete.cases(data),]

Check that rows carefully, like in my case the rows having no value ",,,,,,,,," (in my file columns predictor variables were comma separated) were showed as NA at the time of RF run.

You can either delete that rows.

Thanks