I'm having trouble with a data frame and couldn't really resolve that issue myself:
The dataframe has arbitrary properties as columns and each row represents one data set.
The question is:
How to get rid of columns where for ALL rows the value is NA?
Try this:
I hope this may also help. It could be made into a single command, but I found it easier for me to read by dividing it in two commands. I made a function with the following instruction and worked lightning fast.
naColsRemoval = function (DataTable) { na.cols = DataTable [ , .( which ( apply ( is.na ( .SD ) , 2 , all ) ) )] DataTable [ , unlist (na.cols) := NULL , with = F] }
.SD will allow to limit the verification to part of the table, if you wish, but it will take the whole table as
The accepted answer does not work with non-numeric columns. From this answer, the following works with columns containing different data types
The two approaches offered thus far fail with large data sets as (amongst other memory issues) they create
is.na(df)
, which will be an object the same size asdf
.Here are two approaches that are more memory and time efficient
An approach using
Filter
and an approach using data.table (for general time and memory efficiency)
examples using large data (30 columns, 1e6 rows)
Another way would be to use the
apply()
function.If you have the data.frame
then you can use
apply()
to see which columns fulfill your condition and so you can simply do the same subsetting as in the answer by Musa, only with anapply
approach.