R: Remove multiple empty columns of character vari

I have a data frame where all the variables are of character type. Many of the columns are completely empty, i.e. only the variable headers are there, but no values. Is there any way to subset out the empty columns?

标签： r is-empty isnullorempty

7条回答

放荡不羁爱自由

2楼-- · 2019-01-17 18:41

It depends what you mean by empty: Is it NA or "", or can it even be " "? Something like this might work:

df[,!apply(df, 2, function(x) all(gsub(" ", "", x)=="", na.rm=TRUE))]

0人赞添加讨论(0) 举报

啃猪蹄的小仙女

3楼-- · 2019-01-17 18:42

If you know the column indices, you can use

df[,-c(3, 5, 7)]

This will omit columns 3, 5, 7.

0人赞添加讨论(0) 举报

疯言疯语

4楼-- · 2019-01-17 18:47

If you're talking about columns where all values are NA, use remove_empty("cols") from the janitor package.

If you have character vectors where every value is the empty string "", you can first convert those values to NA throughout your data.frame with na_if from the dplyr package:

dat <- data.frame(
  x = c("a", "b", "c"),
  y = c("", "", ""),
  z = c(NA, NA, NA),
  stringsAsFactors = FALSE
)

dat
#>   x y  z
#> 1 a   NA
#> 2 b   NA
#> 3 c   NA

library(dplyr)
library(janitor)

dat %>%
  mutate_all(funs(na_if(., ""))) %>%
  remove_empty("cols")
#>   x
#> 1 a
#> 2 b
#> 3 c

0人赞添加讨论(0) 举报

我欲成王，谁敢阻挡

5楼-- · 2019-01-17 18:51

I have a similar situation -- I'm working with a large public records database but when I whittle it down to just the date range and category that I need, there are a ton of columns that aren't in use. Some are blank and some are NA.

The selected answer: https://stackoverflow.com/a/17672737/233467 didn't work for me, but this did:

df[!sapply(df, function (x) all(is.na(x) | x == ""))]

0人赞添加讨论(0) 举报

\"骚年 ilove

6楼-- · 2019-01-17 18:52

You can do either of the following:

emptycols <- sapply(df, function (k) all(is.na(k)))
df <- df[!emptycols]

or:

emptycols <- colSums(is.na(df)) == nrow(df)
df <- df[!emptycols]

If by empty you mean they are "", the second approach can be adapted like so:

emptycols <- colSums(df == "") == nrow(df)

0人赞添加讨论(0) 举报

欢心

7楼-- · 2019-01-17 18:52

Here is something that can be modified to exclude columns containing any variables specied.

newdf= df[, apply(df, 2, function(x) !any({is.na(x) | x== "" | 
x== "-4"} ) )]

0人赞添加讨论(0) 举报

1 2 下一页

R: Remove multiple empty columns of character vari

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间