confusing behavior of purrr::pmap with rlang; “to

2019-04-28 11:40发布

问题:

I have a custom function where I am reading entered variables from a dataframe using rlang. This function works just fine irrespective of whether the arguments entered are quoted or unquoted. But, strangely enough, when this function is used with purrr::pmap, it works only if the argument is quoted.

So I have two questions:

  1. Why does the function behavior this way?

  2. How can I make a function using rlang such that I won't have to quote the arguments even if used in purrr::pmap?

Here is a minimal reprex that uses a simple function to highlight this issue:

# loading the needed libraries
library(rlang)
library(dplyr)
library(purrr)


# defining the function
tryfn <- function(data, x, y) {
  data <-
    dplyr::select(
      .data = data,
      x = !!rlang::enquo(x),
      y = !!rlang::enquo(y)
    )

  # creating a dataframe of means
  result_df <- data.frame(mean.x = mean(data$x), mean.y = mean(data$y))

  # return the dataframe
  return(result_df)
}

# without quotes (works!)
tryfn(iris, Sepal.Length, Sepal.Width)
#>     mean.x   mean.y
#> 1 5.843333 3.057333

# with quotes (works!)
tryfn(iris, "Sepal.Length", "Sepal.Width")
#>     mean.x   mean.y
#> 1 5.843333 3.057333

# pmap without quotes (doesn't work)
purrr::pmap(.l = list(
  data = list(iris, mtcars, ToothGrowth),
  x = list(Sepal.Length, wt, len),
  y = list(Sepal.Width, mpg, dose)
),
.f = tryfn)
#> Error in is.data.frame(.l): object 'Sepal.Length' not found

# pmap with quotes (works!)
purrr::pmap(.l = list(
  data = list(iris, mtcars, ToothGrowth),
  x = list("Sepal.Length", "wt", "len"),
  y = list("Sepal.Width", "mpg", "dose")
),
.f = tryfn)
#> [[1]]
#>     mean.x   mean.y
#> 1 5.843333 3.057333
#> 
#> [[2]]
#>    mean.x   mean.y
#> 1 3.21725 20.09062
#> 
#> [[3]]
#>     mean.x   mean.y
#> 1 18.81333 1.166667

Created on 2018-05-21 by the reprex package (v0.2.0).

回答1:

The problem was: R saw Sepal.Length, wt, len symbols so it tried to look in the current environment and evaluated them. Of course it resulted in errors as they were columns of a data frame. When you quoted them, R didn't try to evaluate and returned values as it saw those as strings.

If you replace list with base::alist or dplyr::vars or rlang::exprs, it should work

Note: as we already quote the inputs, we don't need to use rlang::enquo inside tryfn anymore.

# loading the needed libraries
library(rlang)
library(tidyverse)

# defining the function
tryfn <- function(data, x, y) {
  data <-
    dplyr::select(
      .data = data,
      x = !! x,
      y = !! y
    )

  # creating a data frame of means
  result_df <- data.frame(mean.x = mean(data$x), mean.y = mean(data$y))

  # return the data frame
  return(result_df)
}

# alist handles its arguments as if they described function arguments. 
# So the values are not evaluated, and tagged arguments with no value are 
# allowed whereas list simply ignores them. 

purrr::pmap(.l = list(
  data = list(iris, mtcars, ToothGrowth),
  x    = alist(Sepal.Length, wt, len),
  y    = alist(Sepal.Width, mpg, dose)
),
.f = tryfn)

#> [[1]]
#>     mean.x   mean.y
#> 1 5.843333 3.057333
#> 
#> [[2]]
#>    mean.x   mean.y
#> 1 3.21725 20.09062
#> 
#> [[3]]
#>     mean.x   mean.y
#> 1 18.81333 1.166667


purrr::pmap(.l = list(
  data = list(iris, mtcars, ToothGrowth),
  x    = dplyr::vars(Sepal.Length, wt, len),
  y    = dplyr::vars(Sepal.Width, mpg, dose)
),
.f = tryfn)

#> [[1]]
#>     mean.x   mean.y
#> 1 5.843333 3.057333
#> 
#> [[2]]
#>    mean.x   mean.y
#> 1 3.21725 20.09062
#> 
#> [[3]]
#>     mean.x   mean.y
#> 1 18.81333 1.166667

purrr::pmap(.l = list(
  data = list(iris, mtcars, ToothGrowth),
  x    = rlang::exprs(Sepal.Length, wt, len),
  y    = rlang::exprs(Sepal.Width, mpg, dose)
),
.f = tryfn)

#> [[1]]
#>     mean.x   mean.y
#> 1 5.843333 3.057333
#> 
#> [[2]]
#>    mean.x   mean.y
#> 1 3.21725 20.09062
#> 
#> [[3]]
#>     mean.x   mean.y
#> 1 18.81333 1.166667

Created on 2018-05-21 by the reprex package (v0.2.0).



回答2:

The issue isn't with purrr, really. The same behavior can be observed with:

list(Sepal.Length) # Error: object 'Sepal.Length' not found

As I understand it, all of the magic with !!, enquo, and the like is available when you're passing arguments into a function you have created. That's why it works to pass in the unquoted field names to tryfn() directly.

But with pmap(), you're putting the field names (Sepal.Width, wt, etc) in a list definition, and list doesn't like that - so pmap never even gets a chance to pass things into tryfn since your list barfs on definition.

Passing in your field names as strings works just fine, as list can accommodate that data type, and then pmap has the chance to map them into tryfn().

Hadley's review of quasiquotation with dplyr might be useful to you.

To answer your second question:

How can I make a function using rlang such that I won't have to quote the arguments even if used in purrr::pmap?

You can wrap your field names with quo() to avoid literally quoting them as strings, although I'm not sure that's much of an improvement:

purrr::pmap(.l = list(
  data = list(iris, mtcars, ToothGrowth),
  x = list(quo(Sepal.Length), quo(wt), quo(len)),
  y = list(quo(Sepal.Width), quo(mpg), quo(dose))
),
.f = tryfn) %>% 
  bind_rows(., .id="dataset")

  dataset    mean.x    mean.y
1       1  5.843333  3.057333
2       2  3.217250 20.090625
3       3 18.813333  1.166667