Tidyeval: pass list of columns as quosure to selec

2019-05-22 20:31发布

问题:

I want to pass a bunch of columns to pmap() inside mutate(). Later, I want to select those same columns.

At the moment, I'm passing a list of column names to pmap() as a quosure, which works fine, although I have no idea whether this is the "right" way to do it. But I can't figure out how to use the same quosure/list for select().

I've got almost no experience with tidyeval, I've only got this far by playing around. I imagine there must be a way to use the same thing both for pmap() and select(), preferably without having to put each of my column names in quotation marks, but I haven't found it yet.

library(dplyr)
library(rlang)
library(purrr)

df <- tibble(a = 1:3,
             b = 101:103) %>% 
    print
#> # A tibble: 3 x 2
#>       a     b
#>   <int> <int>
#> 1     1   101
#> 2     2   102
#> 3     3   103

cols_quo <- quo(list(a, b))

df2 <- df %>% 
    mutate(outcome = !!cols_quo %>% 
               pmap_int(function(..., word) {
                   args <- list(...)

                   # just to be clear this isn't what I actually want to do inside pmap
                   return(args[[1]] + args[[2]])
               })) %>% 
    print()
#> # A tibble: 3 x 3
#>       a     b outcome
#>   <int> <int>   <int>
#> 1     1   101     102
#> 2     2   102     104
#> 3     3   103     106

# I get why this doesn't work, but I don't know how to do something like this that does
df2 %>% 
    select(!!cols_quo)
#> Error in .f(.x[[i]], ...): object 'a' not found

回答1:

This is a bit tricky because of the mix of semantics involved in this problem. pmap() takes a list and passes each element as its own argument to a function (it's kind of equivalent to !!! in that sense). Your quoting function thus needs to quote its arguments and somehow pass a list of columns to pmap().

Our quoting function can go one of two ways. Either quote (i.e., delay) the list creation, or create an actual list of quoted expressions right away:

quoting_fn1 <- function(...) {
  exprs <- enquos(...)

  # For illustration purposes, return the quoted inputs instead of
  # doing something with them. Normally you'd call `mutate()` here:
  exprs
}

quoting_fn2 <- function(...) {
  expr <- quo(list(!!!enquos(...)))

  expr
}

Since our first variant does nothing but return a list of quoted inputs, it's actually equivalent to quos():

quoting_fn1(a, b)
#> <list_of<quosure>>
#>
#> [[1]]
#> <quosure>
#> expr: ^a
#> env:  global
#>
#> [[2]]
#> <quosure>
#> expr: ^b
#> env:  global

The second version returns a quoted expression that instructs R to create a list with quoted inputs:

quoting_fn2(a, b)
#> <quosure>
#> expr: ^list(^a, ^b)
#> env:  0x7fdb69d9bd20

There is a subtle but important difference between the two. The first version creates an actual list object:

exprs <- quoting_fn1(a, b)
typeof(exprs)
#> [1] "list"

On the other hand, the second version does not return a list, it returns an expression for creating a list:

expr <- quoting_fn2(a, b)
typeof(expr)
#> [1] "language"

Let's find out which version is more appropriate for interfacing with pmap(). But first we'll give a name to the pmapped function to make the code clearer and easier to experiment with:

myfunction <- function(..., word) {
  args <- list(...)
  # just to be clear this isn't what I actually want to do inside pmap
  args[[1]] + args[[2]]
}

Understanding how tidy eval works is hard in part because we usually don't get to observe the unquoting step. We'll use rlang::qq_show() to reveal the result of unquoting expr (the delayed list) and exprs (the actual list) with !!:

rlang::qq_show(
  mutate(df, outcome = pmap_int(!!expr, myfunction))
)
#> mutate(df, outcome = pmap_int(^list(^a, ^b), myfunction))

rlang::qq_show(
  mutate(df, outcome = pmap_int(!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))

When we unquote the delayed list, mutate() calls pmap_int() with list(a, b), evaluated in the data frame, which is exactly what we need:

mutate(df, outcome = pmap_int(!!expr, myfunction))
#> # A tibble: 3 x 3
#>       a     b outcome
#>   <int> <int>   <int>
#> 1     1   101     102
#> 2     2   102     104
#> 3     3   103     106

On the other hand, if we unquote an actual list of quoted expressions, we get an error:

mutate(df, outcome = pmap_int(!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#>   Evaluation error: Element 1 is not a vector (language).

That's because the quoted expressions inside the list are not evaluated in the data frame. In fact, they are not evaluated at all. pmap() gets the quoted expressions as is, which it doesn't understand. Recall what qq_show() has shown us:

#> mutate(df, outcome = pmap_int(<S3: quosures>, myfunction))

Anything inside angular brackets is passed as is. This is a sign that we should somehow have used !!! instead, to inline each element of the list of quosures in the surrounding expression. Let's try it:

rlang::qq_show(
  mutate(df, outcome = pmap_int(!!!exprs, myfunction))
)
#> mutate(df, outcome = pmap_int(^a, ^b, myfunction))

Hmm... Doesn't look right. We're supposed to pass a list to pmap_int(), and here it gets each quoted input as separate argument. Indeed we get a type error:

mutate(df, outcome = pmap_int(!!!exprs, myfunction))
#> Error in mutate_impl(.data, dots) :
#>   Evaluation error: `.x` is not a list (integer).

That's easy to fix, just splice into a call to list():

rlang::qq_show(
  mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
)
#> mutate(df, outcome = pmap_int(list(^a, ^b), myfunction))

And voilà!

mutate(df, outcome = pmap_int(list(!!!exprs), myfunction))
#> # A tibble: 3 x 3
#>       a     b outcome
#>   <int> <int>   <int>
#> 1     1   101     102
#> 2     2   102     104
#> 3     3   103     106


回答2:

We can use quos when there are more than one element and evaluate with !!!

cols_quo <- quos(a, b)
df2 %>%
    select(!!!cols_quo)

The object 'df2' can be created with

df %>%
    mutate(output = list(!!! cols_quo) %>% 
        reduce(`+`))

If we want to use the quosure as in the OP's post

cols_quo <- quo(list(a, b))
df2 %>%
    select(!!! as.list(quo_expr(cols_quo))[-1])
# A tibble: 3 x 2
#      a     b
#  <int> <int>
#1     1   101
#2     2   102
#3     3   103