Functional programming with dplyr

2019-02-09 06:52发布

问题:

Looking for a more efficient / elegant way to pass multiple arguments to a group-by using non-standard evaluation in a function using dplyr. I don't want to use the ... operator, but to specify the functions individually.

My specific use case is a function which takes a data frame and creates a ggplot object with simpler syntax. Here is an example of the code I want to automate with my function:

# create data frame
my_df <- data.frame(month = sample(1:12, 1000, replace = T),
                    category = sample(head(letters, 3), 1000, replace = T),
                    approved = as.numeric(runif(1000) < 0.5))

my_df$converted <- my_df$approved * as.numeric(runif(1000) < 0.5)

my_df %>%
  group_by(month, category) %>%
  summarize(conversion_rate = sum(converted) / sum(approved)) %>%
  ggplot + geom_line(aes(x = month, y = conversion_rate, group = category, 
  color = category))

I want to combine that group_by, summarize, ggplot, and geom_line into a simple function that I can feed an x, y, and group, and have it perform all the dirty work under the hood. Here's what I've gotten to work:

# create the function that does the grouping and plotting
plot_lines <- function(df, x, y, group) {

  x <- enquo(x)
  group <- enquo(group)
  group_bys <- quos(!! x, !! group)

  df %>%
    group_by(!!! group_bys) %>%
    my_smry %>%
    ggplot + geom_line(aes_(x = substitute(x), y = substitute(y), 
    group = substitute(group), color = substitute(group)))
}

# create a function to do the summarization
my_smry <- function(x) {
  x %>% 
    summarize(conversion_rate = sum(converted) / sum(approved))
}

# use my function
my_df %>% 
  plot_lines(x = month, y = conversion_rate, group = category)

I feel like the group_by handling is pretty inelegant: quoting x and group with enquo, then unquoting them with !! inside of another quoting function quos, only to re-unquote them with !!! on the next line, but it's the only thing I've been able to get to work. Is there a better way to do this?

Also, is there a way to get ggplot to take !! instead of substitute? What I'm doing feels inconsistent.

回答1:

The problem is that ggplot hasn't been updated to handle quosures yet, so you've got to pass it expressions, which you can create from quosures with rlang::quo_expr:

library(tidyverse)
set.seed(47)

my_df <- data_frame(month = sample(1:12, 1000, replace = TRUE),
                    category = sample(head(letters, 3), 1000, replace = TRUE),
                    approved = as.numeric(runif(1000) < 0.5),
                    converted = approved * as.numeric(runif(1000) < 0.5))

plot_lines <- function(df, x, y, group) {
    x <- enquo(x)
    y <- enquo(y)
    group <- enquo(group)

    df %>%
        group_by(!! x, !! group) %>%
        summarise(conversion_rate = sum(converted) / sum(approved)) %>%
        ggplot(aes_(x = rlang::quo_expr(x), 
                    y = rlang::quo_expr(y), 
                    color = rlang::quo_expr(group))) + 
        geom_line()
}

my_df %>% plot_lines(month, conversion_rate, category)

However, keep in mind that ggplot will almost inevitably be updated from lazyeval to rlang, so while this interface will probably keep working, a simpler, more consistent one will probably be possible shortly.



回答2:

You could just do a straight eval.parent(substitute(...)) like this. Being base R it works consistently across R and is simple to do. One can even use an ordinary aes.

plot_lines <- function(df, x, y, group) eval.parent(substitute(
   df %>%
      group_by(x, group) %>%
      my_smry %>%
      ggplot + geom_line(aes(x = x, y = y, group = group, color = group))
))
plot_lines(my_df, month, conversion_rate, category)