Select columns based on classes/types with compati

2019-02-25 19:56发布

问题:

Actual question

How do I define a select helper that selects columns based on their class/type and that is also compatible with dplyr's architecture?

Due dilligence

I've looked at https://cran.r-project.org/web/packages/dplyr/vignettes/introduction.html and the help for dplyr::select_helpers but didn't find anything that would allow me to select based on classes/types

Example

Bring in some variation WRT classes/types:

dat <- mtcars
dat <- dat %>% mutate(
  mpg = as.character(mpg),
  wt = as.factor(wt),
  vs = as.character(vs)
)

In short, I would like to make this a generic approach for all possible classes/types (and combinations of them) in R:

dat[ , sapply(dat, is.character)]
# mpg    wt vs
# 1    21  2.62  0
# 2    21 2.875  0
# 3  22.8  2.32  1
# 4  21.4 3.215  1

Based on Subset variables in data frame based on column type I could do it like this:

select_on_class <- function(.data, cls = "numeric") {
  dat[ , names(.data)[sapply(.data,
    function(vec, clss) class(vec) %in% clss, clss = cls)]]
}
dat %>% select_on_class(c("character", "factor"))
# mpg    wt vs
# 1    21  2.62  0
# 2    21 2.875  0
# 3  22.8  2.32  1
# 4  21.4 3.215  1

But I would like to be able to use it in calls to dplyr::select, so I tried this:

has_class <- function(.data, cls = "numeric") {
  nms <- names(.data)[sapply(.data,
    function(vec, clss) class(vec) %in% clss, clss = cls)]
  sapply(nms, as.name)
}
dat %>% has_class(c("character", "factor"))
# $mpg
# mpg
# 
# $wt
# wt
# 
# $vs
# vs

The problem is that sapply(nms, as.name) returns a list and that doesn't play nicely with the internals of select (which I don't completely understand yet, BTW):

dat %>% select(has_class(c("character", "factor")))
# Error: All select() inputs must resolve to integer column positions.
# The following do not:
#   *  has_class("character")

dat %>% select_(has_class(c("character", "factor")))
# Error in UseMethod("as.lazy") : 
#   no applicable method for 'as.lazy' applied to an object of class "list"

EDIT

Based on the answer using select_if I tried to generalize and got stuck:

has_class <- function(.data, cls) {
  sapply(.data, function(vec, clss) class(vec) %in% clss, clss = cls)
}
dat %>% has_class(c("character", "factor"))
# mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
# TRUE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE
dat %>% select_if(has_class, c("character", "factor"))
# Error in vapply(tbl, p, logical(1), ...) : values must be length 1,
# but FUN(X[[1]]) result is length 32

AFAIU, the .predicate functions just needs to return a logical vector (which has_class does) and I can pass additional arguments to the .predicate functions via ... (which I did). So where am I still going wrong?

回答1:

I think dplyr::select_if() may be what you are looking for. For example

dat <- mtcars %>% 
       mutate(mpg = as.character(mpg),
              wt = as.character(wt),
              vs = as.character(vs)
       ) %>% 
       select_if(is.character)


回答2:

If instead of list, we can return a character vector form our custom function, then we can use one_of:

has_class_v1 <- function(.data, cls = "numeric") {
  names(.data)[sapply(.data,
                      function(vec, clss) class(vec) %in% clss, clss = cls)]
  }

has_class_v1(dat, "character")
# [1] "mpg" "wt"  "vs" 

# use one_of
dat %>%
  select(one_of(has_class_v1(.,"character"))) %>% 
  head
#    mpg    wt vs
# 1   21  2.62  0
# 2   21 2.875  0
# 3 22.8  2.32  1
# 4 21.4 3.215  1
# 5 18.7  3.44  0
# 6 18.1  3.46  1


回答3:

The most streamlined and generalisable way of achieving this in my opinion while leveraging dplyr would be to use dplyr::select_if but in a more direct way than the one suggested by @wjchulme (although a nice trick) :

dat %>%
select_if(sapply(., class) %in% c("numeric", "character"))

And so on, with more classes if needed. I hope this helps.