I'm writing a package of functions for making tables of demographics data. I have one function, abbreviated below, where I need to take in several columns (...
) on which I'll gather
a data frame. The trick is I'd like to keep those columns' names in order, because I'll need to put a column in that order after gathering. In this case, those columns are estimate
, moe
, share
, sharemoe
.
library(tidyverse)
library(rlang)
race <- structure(list(region = c("New Haven", "New Haven", "New Haven", "New Haven", "Outer Ring", "Outer Ring", "Outer Ring", "Outer Ring"),
variable = c("white", "black", "asian", "latino", "white", "black", "asian", "latino"),
estimate = c(40164, 42970, 6042, 37231, 164150, 3471, 9565, 8518),
moe = c(1395, 1383, 697, 1688, 1603, 677, 896, 1052),
share = c(0.308, 0.33, 0.046, 0.286, 0.87, 0.018, 0.051, 0.045),
sharemoe = c(0.011, 0.011, 0.005, 0.013, 0.008, 0.004, 0.005, 0.006)),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -8L))
race
#> # A tibble: 8 x 6
#> region variable estimate moe share sharemoe
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 New Haven white 40164 1395 0.308 0.011
#> 2 New Haven black 42970 1383 0.33 0.011
#> 3 New Haven asian 6042 697 0.046 0.005
#> 4 New Haven latino 37231 1688 0.286 0.013
#> 5 Outer Ring white 164150 1603 0.87 0.008
#> 6 Outer Ring black 3471 677 0.018 0.004
#> 7 Outer Ring asian 9565 896 0.051 0.005
#> 8 Outer Ring latino 8518 1052 0.045 0.006
In the function gather_arrange
, I'm getting the names of the ...
columns by mapping over rlang::exprs(...)
and converting to character. It was a struggle to get this working to extract the names of those columns as strings, so this might be a place to improve upon or rewrite. But this works how I want, making the column type
as a factor with levels estimate
, moe
, share
, sharemoe
in this order.
gather_arrange <- function(df, ..., group = variable) {
gather_cols <- rlang::quos(...)
grp_var <- rlang::enquo(group)
gather_names <- purrr::map_chr(rlang::exprs(...), as.character)
df %>%
tidyr::gather(key = type, value = value, !!!gather_cols) %>%
dplyr::mutate(!!rlang::quo_name(grp_var) := !!grp_var %>%
forcats::fct_inorder() %>% forcats::fct_rev()) %>%
dplyr::mutate(type = as.factor(type) %>% forcats::fct_relevel(gather_names)) %>%
arrange(type)
}
race %>% gather_arrange(estimate, moe, share, sharemoe)
#> # A tibble: 32 x 4
#> region variable type value
#> <chr> <fct> <fct> <dbl>
#> 1 New Haven white estimate 40164
#> 2 New Haven black estimate 42970
#> 3 New Haven asian estimate 6042
#> 4 New Haven latino estimate 37231
#> 5 Outer Ring white estimate 164150
#> 6 Outer Ring black estimate 3471
#> 7 Outer Ring asian estimate 9565
#> 8 Outer Ring latino estimate 8518
#> 9 New Haven white moe 1395
#> 10 New Haven black moe 1383
#> # ... with 22 more rows
But I'd like the option of also using the colon notation for selecting columns, i.e. estimate:sharemoe
to do the equivalent of inputting all those column names.
race %>% gather_arrange(estimate:sharemoe)
#> Error: Result 1 is not a length 1 atomic vector
This fails, because it can't pull out the column names from rlang::exprs(...)
. How can I get the column names with this notation? Thanks in advance!
We could create an
if
condition for those cases with:
, get the column names ('gather_names') fromselect
to be used in thefct_relevel
-checking
Update
If we are passing multiple sets of columns in the
...
-checking
Or check for only a single set of columns
I think the function you are looking for is
tidyselect::vars_select()
, which is used internally by select and rename to accomplish this task. It returns a character vector of variable names. For example:This allows you to use all the same syntax that is valid for
dplyr::select
.This deals with having both
:
symbol expressions and other column names as arguments e.g.fun(race, estimate:sharemoe, region)
.Interestingly, this hacky solution appears to be quicker than
tidyselect
(not that variable selection is likely to be a pain-point in the overall speed)Original function
Function using above-defined
fun
: