I'd like to remove the lines in this data frame that:
a) contain NA
s across all columns. Below is my example data frame.
gene hsap mmul mmus rnor cfam
1 ENSG00000208234 0 NA NA NA NA
2 ENSG00000199674 0 2 2 2 2
3 ENSG00000221622 0 NA NA NA NA
4 ENSG00000207604 0 NA NA 1 2
5 ENSG00000207431 0 NA NA NA NA
6 ENSG00000221312 0 1 2 3 2
Basically, I'd like to get a data frame such as the following.
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
6 ENSG00000221312 0 1 2 3 2
b) contain NA
s in only some columns, so I can also get this result:
gene hsap mmul mmus rnor cfam
2 ENSG00000199674 0 2 2 2 2
4 ENSG00000207604 0 NA NA 1 2
6 ENSG00000221312 0 1 2 3 2
If you want control over how many NAs are valid for each row, try this function. For many survey data sets, too many blank question responses can ruin the results. So they are deleted after a certain threshold. This function will allow you to choose how many NAs the row can have before it's deleted:
By default, it will eliminate all NAs:
Or specify the maximum number of NAs allowed:
Try
na.omit(your.data.frame)
. As for the second question, try posting it as another question (for clarity).Another option if you want greater control over how rows are deemed to be invalid is
Using the above, this:
Becomes:
...where only row 5 is removed since it is the only row containing NAs for both
rnor
ANDcfam
. The boolean logic can then be changed to fit specific requirements.Using dplyr package we can filter NA as follows:
Above function deletes all the rows from the data frame that has 'NA' in any column and returns the resultant data. If you want to check for multiple values like
NA
and?
changedart=c('NA')
in function param todart=c('NA', '?')
I prefer following way to check whether rows contain any NAs:
This returns logical vector with values denoting whether there is any NA in a row. You can use it to see how many rows you'll have to drop:
and eventually drop them
For filtering rows with certain part of NAs it becomes a little trickier (for example, you can feed 'final[,5:6]' to 'apply'). Generally, Joris Meys' solution seems to be more elegant.