How to remove rows where all columns are zero usin

2019-02-24 20:23发布

问题:

I have the following data frame:

dat <- structure(list(`A-XXX` = c(1.51653275922944, 0.077037240321129, 
0), `fBM-XXX` = c(2.22875185527511, 0, 0), `P-XXX` = c(1.73356698481106, 
0, 0), `vBM-XXX` = c(3.00397859609183, 0, 0)), .Names = c("A-XXX", 
"fBM-XXX", "P-XXX", "vBM-XXX"), row.names = c("BATF::JUN_AHR", 
"BATF::JUN_CCR9", "BATF::JUN_IL10"), class = "data.frame")

dat 
#>                     A-XXX  fBM-XXX    P-XXX  vBM-XXX
#> BATF::JUN_AHR  1.51653276 2.228752 1.733567 3.003979
#> BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000
#> BATF::JUN_IL10 0.00000000 0.000000 0.000000 0.000000

I can remove the row with all column zero with this command:

> dat <- dat[ rowSums(dat)!=0, ]
> dat
                    A-XXX  fBM-XXX    P-XXX  vBM-XXX
BATF::JUN_AHR  1.51653276 2.228752 1.733567 3.003979
BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000

But how can I do it with dplyr's pipe style?

回答1:

Here's a dplyr option:

library(dplyr)
filter_all(dat, any_vars(. != 0))

#       A-XXX  fBM-XXX    P-XXX  vBM-XXX
#1 1.51653276 2.228752 1.733567 3.003979
#2 0.07703724 0.000000 0.000000 0.000000

Here we make use of the logic that if any variable is not equal to zero, we will keep it. It's the same as removing rows where all variables are equal to zero.

Regarding row.names:

library(tidyverse)
dat %>% rownames_to_column() %>% filter_at(vars(-rowname), any_vars(. != 0))
#         rowname      A-XXX  fBM-XXX    P-XXX  vBM-XXX
#1  BATF::JUN_AHR 1.51653276 2.228752 1.733567 3.003979
#2 BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000


回答2:

We could use reduce from purrr to get the sum of rows and filter the dataset based on the logical vector

library(tidyverse)
dat %>%
    reduce(`+`) %>%
    {. != 0} %>% 
   filter(dat, .)
#       A-XXX  fBM-XXX    P-XXX  vBM-XXX
#1 1.51653276 2.228752 1.733567 3.003979
#2 0.07703724 0.000000 0.000000 0.000000

NOTE: Within the %>%, the row.names gets stripped off. It may be better to create a new column or assign row.names later


If we need the row names as well, then create a row names column early and then use that to change the row names at the end

dat %>%
  rownames_to_column('rn') %>%
  filter(rowSums(.[-1]) != 0) %>% 
  `row.names<-`(., .[['rn']]) %>% select(-rn)
#                   A-XXX  fBM-XXX    P-XXX  vBM-XXX
#BATF::JUN_AHR  1.51653276 2.228752 1.733567 3.003979
#BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000


回答3:

Here is a third option that uses purrr::pmap to generate the indices of whether or not all rows are zero. Definitely less compact than filter_at, but opens up options for interesting and complex conditions using pmap!

dat <- structure(list(`A-XXX` = c(1.51653275922944, 0.077037240321129, 
                                  0), `fBM-XXX` = c(2.22875185527511, 0, 0), `P-XXX` = c(1.73356698481106, 
                                                                                         0, 0), `vBM-XXX` = c(3.00397859609183, 0, 0)), .Names = c("A-XXX", 
                                                                                                                                                   "fBM-XXX", "P-XXX", "vBM-XXX"), row.names = c("BATF::JUN_AHR", 
                                                                                                                                                                                                 "BATF::JUN_CCR9", "BATF::JUN_IL10"), class = "data.frame")

library(tidyverse)
dat %>%
  rownames_to_column() %>%
  bind_cols(all_zero = pmap_lgl(., function(rowname, ...) all(list(...) == 0))) %>%
  filter(all_zero == FALSE) %>%
  `rownames<-`(.$rowname) %>%
  select(-rowname, -all_zero)
#>                     A-XXX  fBM-XXX    P-XXX  vBM-XXX
#> BATF::JUN_AHR  1.51653276 2.228752 1.733567 3.003979
#> BATF::JUN_CCR9 0.07703724 0.000000 0.000000 0.000000

Created on 2018-03-14 by the reprex package (v0.2.0).