I'm having trouble with a data frame and couldn't really resolve that issue myself:
The dataframe has arbitrary properties as columns and each row represents one data set.

The question is:
How to get rid of columns where for ALL rows the value is NA?

标签： r apply dataframe

7条回答

Luminary・发光体

2楼-- · 2019-01-07 03:30

Try this:

df <- df[,colSums(is.na(df))<nrow(df)]

0人赞添加讨论(0) 举报

太酷不给撩

3楼-- · 2019-01-07 03:41

I hope this may also help. It could be made into a single command, but I found it easier for me to read by dividing it in two commands. I made a function with the following instruction and worked lightning fast.

naColsRemoval = function (DataTable) { na.cols = DataTable [ , .( which ( apply ( is.na ( .SD ) , 2 , all ) ) )] DataTable [ , unlist (na.cols) := NULL , with = F] }

.SD will allow to limit the verification to part of the table, if you wish, but it will take the whole table as

0人赞添加讨论(0) 举报

【Aperson】

4楼-- · 2019-01-07 03:42

The accepted answer does not work with non-numeric columns. From this answer, the following works with columns containing different data types

Filter(function(x) !all(is.na(x)), df)

0人赞添加讨论(0) 举报

Fickle 薄情

5楼-- · 2019-01-07 03:46

df[sapply(df, function(x) all(is.na(x)))] <- NULL

0人赞添加讨论(0) 举报

SAY GOODBYE

6楼-- · 2019-01-07 03:52

The two approaches offered thus far fail with large data sets as (amongst other memory issues) they create is.na(df), which will be an object the same size as df.

Here are two approaches that are more memory and time efficient

An approach using Filter

Filter(function(x)!all(is.na(x)), df)

and an approach using data.table (for general time and memory efficiency)

library(data.table)
DT <- as.data.table(df)
DT[,which(unlist(lapply(DT, function(x)!all(is.na(x))))),with=F]

examples using large data (30 columns, 1e6 rows)

big_data <- replicate(10, data.frame(rep(NA, 1e6), sample(c(1:8,NA),1e6,T), sample(250,1e6,T)),simplify=F)
bd <- do.call(data.frame,big_data)
names(bd) <- paste0('X',seq_len(30))
DT <- as.data.table(bd)

system.time({df1 <- bd[,colSums(is.na(bd) < nrow(bd))]})
# error -- can't allocate vector of size ...
system.time({df2 <- bd[, !apply(is.na(bd), 2, all)]})
# error -- can't allocate vector of size ...
system.time({df3 <- Filter(function(x)!all(is.na(x)), bd)})
## user  system elapsed 
## 0.26    0.03    0.29 
system.time({DT1 <- DT[,which(unlist(lapply(DT, function(x)!all(is.na(x))))),with=F]})
## user  system elapsed 
## 0.14    0.03    0.18

0人赞添加讨论(0) 举报

神经病院院长

7楼-- · 2019-01-07 03:53

Another way would be to use the apply() function.

If you have the data.frame

df <- data.frame (var1 = c(1:7,NA),
                  var2 = c(1,2,1,3,4,NA,NA,9),
                  var3 = c(NA)
                  )

then you can use apply() to see which columns fulfill your condition and so you can simply do the same subsetting as in the answer by Musa, only with an apply approach.

> !apply (is.na(df), 2, all)
 var1  var2  var3 
 TRUE  TRUE FALSE 

> df[, !apply(is.na(df), 2, all)]
  var1 var2
1    1    1
2    2    2
3    3    1
4    4    3
5    5    4
6    6   NA
7    7   NA
8   NA    9

0人赞添加讨论(0) 举报

1 2 下一页

Remove columns from dataframe where ALL values are

examples using large data (30 columns, 1e6 rows)

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间