Actual question
How do I define a select helper that selects columns based on their class/type and that is also compatible with dplyr
's architecture?
Due dilligence
I've looked at https://cran.r-project.org/web/packages/dplyr/vignettes/introduction.html and the help for dplyr::select_helpers
but didn't find anything that would allow me to select based on classes/types
Example
Bring in some variation WRT classes/types:
dat <- mtcars
dat <- dat %>% mutate(
mpg = as.character(mpg),
wt = as.factor(wt),
vs = as.character(vs)
)
In short, I would like to make this a generic approach for all possible classes/types (and combinations of them) in R:
dat[ , sapply(dat, is.character)]
# mpg wt vs
# 1 21 2.62 0
# 2 21 2.875 0
# 3 22.8 2.32 1
# 4 21.4 3.215 1
Based on Subset variables in data frame based on column type I could do it like this:
select_on_class <- function(.data, cls = "numeric") {
dat[ , names(.data)[sapply(.data,
function(vec, clss) class(vec) %in% clss, clss = cls)]]
}
dat %>% select_on_class(c("character", "factor"))
# mpg wt vs
# 1 21 2.62 0
# 2 21 2.875 0
# 3 22.8 2.32 1
# 4 21.4 3.215 1
But I would like to be able to use it in calls to dplyr::select
, so I tried this:
has_class <- function(.data, cls = "numeric") {
nms <- names(.data)[sapply(.data,
function(vec, clss) class(vec) %in% clss, clss = cls)]
sapply(nms, as.name)
}
dat %>% has_class(c("character", "factor"))
# $mpg
# mpg
#
# $wt
# wt
#
# $vs
# vs
The problem is that sapply(nms, as.name)
returns a list
and that doesn't play nicely with the internals of select
(which I don't completely understand yet, BTW):
dat %>% select(has_class(c("character", "factor")))
# Error: All select() inputs must resolve to integer column positions.
# The following do not:
# * has_class("character")
dat %>% select_(has_class(c("character", "factor")))
# Error in UseMethod("as.lazy") :
# no applicable method for 'as.lazy' applied to an object of class "list"
EDIT
Based on the answer using select_if
I tried to generalize and got stuck:
has_class <- function(.data, cls) {
sapply(.data, function(vec, clss) class(vec) %in% clss, clss = cls)
}
dat %>% has_class(c("character", "factor"))
# mpg cyl disp hp drat wt qsec vs am gear carb
# TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
dat %>% select_if(has_class, c("character", "factor"))
# Error in vapply(tbl, p, logical(1), ...) : values must be length 1,
# but FUN(X[[1]]) result is length 32
AFAIU, the .predicate
functions just needs to return a logical vector (which has_class
does) and I can pass additional arguments to the .predicate
functions via ...
(which I did). So where am I still going wrong?
If instead of list, we can return a character vector form our custom function, then we can use
one_of
:The most streamlined and generalisable way of achieving this in my opinion while leveraging
dplyr
would be to usedplyr::select_if
but in a more direct way than the one suggested by @wjchulme (although a nice trick) :And so on, with more classes if needed. I hope this helps.
I think
dplyr::select_if()
may be what you are looking for. For example