I would like to understand how to pass strings representing expressions into dplyr, so that the variables mentioned in the string are evaluated as expressions on columns in the dataframe. The main vignette on this topic covers passing in quosures, and doesn't discuss strings at all.
It's clear that quosures are safer and clearer than strings when representing expressions, so of course we should avoid strings when quosures can be used instead. However, when working with tools outside the R ecosystem, such as javascript or YAML config files, one will often have to work with strings instead of quosures.
For example, say I want a function that does a grouped tally using expressions passed in by the user/caller. As expected, the following code doesn't work, since dplyr uses nonstandard evaluation to interpret the arguments to group_by
.
library(tidyverse)
group_by_and_tally <- function(data, groups) {
data %>%
group_by(groups) %>%
tally()
}
my_groups <- c('2 * cyl', 'am')
mtcars %>%
group_by_and_tally(my_groups)
#> Error in grouped_df_impl(data, unname(vars), drop): Column `groups` is unknown
In dplyr 0.5 we would use standard evaluation, such as group_by_(.dots = groups)
, to handle this situation. Now that the underscore verbs are deprecated, how should we do this kind of thing in dplyr 0.7?
In the special case of expressions that are just column names we can use the solutions to this question, but they don't work for more complex expressions like 2 * cyl
that aren't just a column name.
It's important to note that, in this simple example, we have control of how the expressions are created. So the best way to pass the expressions is to construct and pass quosures directly using quos()
:
library(tidyverse)
library(rlang)
group_by_and_tally <- function(data, groups) {
data %>%
group_by(UQS(groups)) %>%
tally()
}
my_groups <- quos(2 * cyl, am)
mtcars %>%
group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups: 2 * cyl [?]
#> `2 * cyl` am n
#> <dbl> <dbl> <int>
#> 1 8 0 3
#> 2 8 1 8
#> 3 12 0 4
#> 4 12 1 3
#> 5 16 0 12
#> 6 16 1 2
However, if we receive the expressions from an outside source in the form of strings, we can simply parse the expressions first, which converts them to quosures:
my_groups <- c('2 * cyl', 'am')
my_groups <- my_groups %>% map(parse_quosure)
mtcars %>%
group_by_and_tally(my_groups)
#> # A tibble: 6 x 3
#> # Groups: 2 * cyl [?]
#> `2 * cyl` am n
#> <dbl> <dbl> <int>
#> 1 8 0 3
#> 2 8 1 8
#> 3 12 0 4
#> 4 12 1 3
#> 5 16 0 12
#> 6 16 1 2
Again, we should only do this if we are getting expressions from an outside source that provides them as strings - otherwise we should make quosures directly in the R source code.
It is tempting to use strings but it is almost always better to use expressions. Now that you have quasiquotation, you can easily build up expressions in a flexible way:
lhs <- "cyl"
rhs <- "disp"
expr(!!sym(lhs) * !!sym(rhs))
#> cyl * disp
vars <- c("cyl", "disp")
expr(sum(!!!syms(vars)))
#> sum(cyl, disp)
Package friendlyeval
can help you with this:
library(tidyverse)
library(friendlyeval)
group_by_and_tally <- function(data, groups) {
data %>%
group_by(!!!friendlyeval::treat_strings_as_exprs(groups)) %>%
tally()
}
my_groups <- c('2 * cyl', 'am')
mtcars %>%
group_by_and_tally(my_groups)
# # A tibble: 6 x 3
# # Groups: 2 * cyl [?]
# `2 * cyl` am n
# <dbl> <dbl> <int>
# 1 8 0 3
# 2 8 1 8
# 3 12 0 4
# 4 12 1 3
# 5 16 0 12
# 6 16 1 2