Rencent versions of dplyr deprecate underscore versions of functions, such as filter_, in favour of tidy evaluation.
What is expected new form of the underscore forms with the new way? How do I write avoiding undefined symbols with R CMD check?
library(dplyr)
df <- data_frame(id = rep(c("a","b"), 3), val = 1:6)
df %>% filter_(~id == "a")
# want to avoid this, because it references column id in a variable-style
df %>% filter( id == "a" )
# option A
df %>% filter( UQ(rlang::sym("id")) == "a" )
# option B
df %>% filter( UQ(as.name("id")) == "a" )
# option C
df %>% filter( .data$id == "a" )
Is there a preferred or more conside form? Option C is shortest but is slower on some of my real-world larger datasets and more complex dplyr constructs:
microbenchmark(
sym = dsPClosest %>%
group_by(!!sym(dateVarName), !!sym("depth")) %>%
summarise(temperature = mean(!!sym("temperature"), na.rm = TRUE)
, moisture = mean(!!sym("moisture"), na.rm = TRUE)) %>%
ungroup()
,data = dsPClosest %>%
group_by(!!sym(dateVarName), .data$depth ) %>%
summarise(temperature = mean(.data$temperature , na.rm = TRUE)
, moisture = mean(.data$moisture , na.rm = TRUE)) %>%
ungroup()
,times=10
)
#Unit: milliseconds
# expr min lq mean median uq max neval
# sym 80.05512 84.97267 122.7513 94.79805 100.9679 392.1375 10
# data 4652.83104 4741.99165 5371.5448 5039.63307 5471.9261 7926.7648 10
There is another answer for mutate_ using even more complex syntax.
Based on your comment, I guess it would be:
rlang
is unnecessary, as you can do this with!!
andas.name
instead ofUQ
andsym
.But maybe a better option is a scoped filter, which avoids quosure-related issues:
In the code above
vars()
determines to which columns we're going to apply the filtering statement (in the help forfilter_at
, the filtering statement is called the "predicate". In this case,vars("id")
means the filtering statement is applied only to theid
column. The filtering statement can be either anall_vars()
orany_vars()
statement, though they're equivalent in this case.all_vars(. == "a")
means that all of the columns invars("id")
must equal"a"
. Yes, it's a bit confusing.Timings for data similar to your example: In this case, we use
group_by_at
andsummarise_at
, which are scoped versions of those functions:Original Answer
I think this is a case where you would enter the filter variable as a bare name and then use
enquo
and!!
(the equivalent ofUQ
) to use the filter variable. For example:Similar to option c (.data$) but shorter. However, showed poor performance on my real-world application.
Moreover, I did not find documentation on when this can be used.
Still quite verbose, but avoids the double quotes.
The microbenchmark is equal (or a mini tick faster) than option B, i.e. the as.name solution.