I am trying to create a function that passes a list of column names to a dplyr
function. I know how to do this if the list of columns names is given in the ...
form, as explained in the tidyeval
documentation:
df <- tibble(
g1 = c(1, 1, 2, 2, 2),
g2 = c(1, 2, 1, 2, 1),
a = sample(5),
b = sample(5)
)
my_summarise <- function(df, ...) {
group_var <- quos(...)
df %>%
group_by(!!!group_var) %>%
summarise(a = mean(a))
}
my_summarise(df, g1, g2)
But if I want to list the column names as an argument of the function, the above solution will not work (of course):
my_summarise <- function(df, group_var, sum_var) {
group_var <- quos(group_var) # nor enquo(group_var)
sum_var <- enquo(sum_var)
df %>%
group_by(!!!group_var) %>%
summarise(a = mean(a))
}
my_summarise(df, list(g1, g2), a)
my_summarise(df, list(g1, g2), b)
How can I get the items inside the list to be quoted individually?
This question is similar to Passing dataframe column names in a function inside another function but in the comments it was suggested to use strings, while here I would like to use bare column names.
You could pass your list of arguments using alist
instead of list
, as it won't evaluate the arguments.
my_summarise = function(df, group_var, sum_var) {
group_var = quos(!!! group_var)
sum_var = enquo(sum_var)
df %>%
group_by(!!! group_var) %>%
summarise(!! quo_name( sum_var) := mean( !! sum_var) )
}
my_summarise(df, alist(g1, g2), b)
# A tibble: 4 x 3
# Groups: g1 [?]
g1 g2 b
<dbl> <dbl> <dbl>
1 1 1 2.0
2 1 2 3.0
3 2 1 4.5
4 2 2 1.0
Another alternative would be to pass that argument directly with quos
instead of list
as shown in this answer, which bypasses some complications all together.
my_summarise = function(df, group_var, sum_var) {
# group_var = quos(!!! group_var)
sum_var = enquo(sum_var)
df %>%
group_by(!!! group_var) %>%
summarise(!! quo_name( sum_var) := mean( !! sum_var) )
}
my_summarise(df, quos(g1, g2), b)
# A tibble: 4 x 3
# Groups: g1 [?]
g1 g2 b
<dbl> <dbl> <dbl>
1 1 1 2.0
2 1 2 3.0
3 2 1 4.5
4 2 2 1.0
library(dplyr)
df <- tibble(
g1 = c(1, 1, 2, 2, 2),
g2 = c(1, 2, 1, 2, 1),
a = sample(5),
b = sample(5)
)
my_summarise = function(df, group_var, fun_name) {
df %>%
group_by(!!! group_var) %>%
summarize_all(fun_name)
}
my_summarise(df, alist(g1, g2), mean)
alist() handles the arguments 'g1' and 'g2' as function arguments (does not evaluate them) while !!! (same as UQS() unquotes and splices the list. sum_var is not necessary as it looks like you want to take the mean of both 'a' and 'b'. Also, you can generalize it by passing in the function as well.