可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Reproducible example
cats <-
data.frame(
name = c(letters[1:10]),
weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
type = c(rep("not_fat", 5), rep("fat", 5))
)
get_means <- function(df, metric, group) {
df %>%
group_by(.[[group]]) %>%
mutate(mean_stat = mean(.[[metric]])) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
What I tried
I expect to get two values back, instead I get one value. It appears that the groupby is failing.
I tried everything including using quo(), eval() and substitute(), UQ(), !!, and a whole host of other things to try and make the stuff inside the group_by() work.
This seems awfully simple but I can't figure it out.
Reasoning for code
The decision for variables to be in quotes is because I am using them in ggplot aes_string() calls. I excluded ggplot code inside the function to simplify the code, otherwise it'd be easy because we could use standard evaluation.
回答1:
I think the "intended" way to do this in the tidyeval framework is to enter the arguments as names (rather than strings) and then quote the arguments using enquo()
. ggplot2
understands tidy evaluation operators so this works for ggplot2
as well.
First, let's adapt the dplyr
summary function in your example:
library(tidyverse)
library(rlang)
get_means <- function(df, metric, group) {
metric = enquo(metric)
group = enquo(group)
df %>%
group_by(!!group) %>%
summarise(!!paste0("mean_", as_label(metric)) := mean(!!metric))
}
get_means(cats, weight, type)
type mean_weight
1 fat 20.0
2 not_fat 10.2
get_means(iris, Petal.Width, Species)
Species mean_Petal.Width
1 setosa 0.246
2 versicolor 1.33
3 virginica 2.03
Now add in ggplot:
get_means <- function(df, metric, group) {
metric = enquo(metric)
group = enquo(group)
df %>%
group_by(!!group) %>%
summarise(mean_stat = mean(!!metric)) %>%
ggplot(aes(!!group, mean_stat)) +
geom_point()
}
get_means(cats, weight, type)
I'm not sure what type of plot you have in mind, but you can plot the data and summary values using tidy evaluation. For example:
plot_func = function(data, metric, group) {
metric = enquo(metric)
group = enquo(group)
data %>%
ggplot(aes(!!group, !!metric)) +
geom_point() +
geom_point(data=. %>%
group_by(!!group) %>%
summarise(!!metric := mean(!!metric)),
shape="_", colour="red", size=8) +
expand_limits(y=0) +
scale_y_continuous(expand=expand_scale(mult=c(0,0.02)))
}
plot_func(cats, weight, type)
FYI, you can allow the function to take any number of grouping variables (including none) using the ...
argument and enquos
instead of enquo
(which also requires the use of !!!
(unquote-splice) instead of !!
(unquote)).
get_means <- function(df, metric, ...) {
metric = enquo(metric)
groups = enquos(...)
df %>%
group_by(!!!groups) %>%
summarise(!!paste0("mean_", quo_text(metric)) := mean(!!metric))
}
get_means(mtcars, mpg, cyl, vs)
cyl vs mean_mpg
1 4 0 26
2 4 1 26.7
3 6 0 20.6
4 6 1 19.1
5 8 0 15.1
get_means(mtcars, mpg)
mean_mpg
1 20.1
回答2:
The magrittr pronoun .
represents the whole data, so you've taken the mean of all observations. Instead, use the tidy eval pronoun .data
which represents the slice of data frame for the current group:
get_means <- function(df, metric, group) {
df %>%
group_by(.data[[group]]) %>%
mutate(mean_stat = mean(.data[[metric]])) %>%
pull(mean_stat) %>%
unique()
}
回答3:
If you want to use strings for the names, as in your example, the correct way to do this is to convert the string to a symbol with sym
and unquote with !!
:
get_means <- function(df, metric, group) {
df %>%
group_by(!!sym(group)) %>%
mutate(mean_stat = mean(!!sym(metric))) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
[1] 10.06063 17.45906
If you want to use bare names in your function, then use enquo
with !!
:
get_means <- function(df, metric, group) {
group <- enquo(group)
metric <- enquo(metric)
df %>%
group_by(!!group) %>%
mutate(mean_stat = mean(!!metric)) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = weight, group = type)
[1] 10.06063 17.45906
What is happening in your example?
Interestingly .[[group]]
, does work for grouping, but not the way you think. This subsets the stated column of the dataframe as a vector, then makes that a new variable that it groups on:
cats %>%
group_by(.[['type']])
# A tibble: 10 x 4
# Groups: .[["type"]] [2]
name weight type `.[["type"]]`
<fct> <dbl> <fct> <fct>
1 a 9.60 not_fat not_fat
2 b 8.71 not_fat not_fat
3 c 12.0 not_fat not_fat
4 d 8.48 not_fat not_fat
5 e 11.5 not_fat not_fat
6 f 17.0 fat fat
7 g 20.3 fat fat
8 h 17.3 fat fat
9 i 15.3 fat fat
10 j 17.4 fat fat
Your problem comes with the mutate
statement. Instead of selecting the, mutate(mean_stat = mean(.[['weight']]))
simply extracts the weight
column as a vector, computes the mean, and then assigns that single value to the new column
cats %>%
group_by(.[['type']]) %>%
mutate(mean_stat = mean(.[['weight']]))
# A tibble: 10 x 5
# Groups: .[["type"]] [2]
name weight type `.[["type"]]` mean_stat
<fct> <dbl> <fct> <fct> <dbl>
1 a 9.60 not_fat not_fat 13.8
2 b 8.71 not_fat not_fat 13.8
3 c 12.0 not_fat not_fat 13.8
4 d 8.48 not_fat not_fat 13.8
5 e 11.5 not_fat not_fat 13.8
6 f 17.0 fat fat 13.8
7 g 20.3 fat fat 13.8
8 h 17.3 fat fat 13.8
9 i 15.3 fat fat 13.8
10 j 17.4 fat fat 13.8
回答4:
I would go with slight modification (if I understand correctly what you would like to achive):
get_means <- function(df, metric, group) {
df %>%
group_by(!!sym(group)) %>%
summarise(mean_stat = mean(!!sym(metric)))%>% pull(mean_stat)
}
get_means(cats, "weight", "type")
[1] 20.671772 9.305811
gives exactly same output as :
cats %>% group_by(type) %>% summarise(mean_stat=mean(weight)) %>%
pull(mean_stat)
[1] 20.671772 9.305811
回答5:
using *_at
functions :
library(dplyr)
get_means <- function(df, metric, group) {
df %>%
group_by_at(group) %>%
mutate_at(metric,list(mean_stat = mean)) %>%
pull(mean_stat) %>%
unique()
}
get_means(cats, metric = "weight", group = "type")
# [1] 10.12927 20.40541
data
set.seed(1)
cats <-
data.frame(
name = c(letters[1:10]),
weight = c(rnorm(5, 10, 1), rnorm(5, 20, 3)),
type = c(rep("not_fat", 5), rep("fat", 5))
)