Use dplyr::case_when with arguments programmatical

2020-05-27 15:24发布

问题:

I'd like to be able to use dplyr's case_when in a programmatic way to replace base R cut()function.

Currently, case_when can be used with an external argument through NSE like :

library(dplyr)
library(rlang)

patterns <- list(
  x <= 2 ~ "<=2",
  x <= 4 ~ "2<->4",
  x > 4 ~ ">4"
 )
 x <- 1:10
 case_when(!!!patterns)

What I want to do is : use it with another variable, inside a mutate

The idea would be something like this, although I can't figure out how to get it to work :

library(dplyr)
patterns_lazy <- list(
  !!quo(x) <= 2 ~ "<=2",
  !!quo(x) <= 4 ~ "2<->4",
  !!quo(x) > 4 ~ ">4"
)
x <- "cyl"
mtcars %>% mutate(ABC = case_when(!!!patterns_lazy))

I'd like to be able to define the column (inside a string) that I want to filter on, and retrieve something like this (this example is not working as it's the desired syntax):

x <- "cyl"
mtcars %>%
  select(cyl) %>%
  mutate(ABC = case_when(!!!patterns_lazy)) %>%
  head()

  cyl ABC
1   6 >4
2   6 >4
3   4 2<->4
4   6 >4
5   8 >4
6   6 >4

Thanks for any help :)

回答1:

You cannot use !! there:

patterns <- list(
  !!quo(x) <= 2 ~ "<=2",
  !!quo(x) <= 4 ~ "2<->4",
  !!quo(x) > 4 ~ ">4"
)
  1. Neither list() nor ~ support quasiquotation.
  2. If it did support quasiquotation, you'd need to be careful with operator precedence and wrap your !!quo() in parentheses.
  3. And finally, that quote to x would evaluate to a string and you'd be comparing numbers with strings (in your example it would be "cyl) which R will do happily thanks to implicit coercions :/

So you need to use exprs() instead of list(), and use x with the .data pronoun instead of a quote of x.

exprs() will create a list of unevaluated expressions. Unevaluated is good: if your formula was evaluated, it would carry an environment (here the global env) and that environment doesn't contain any of the data supplied to dplyr, and in particular doesn't have the .data pronoun. On the other hand if the formulas are "context-less", they get evaluated in the data context which is what we want.

patterns_lazy <- exprs(
  .data[[x]] <= 2 ~ "<=2",
  .data[[x]] <= 4 ~ "2<->4",
  .data[[x]] > 4 ~ ">4"
)

x <- "cyl"
pull(mutate(mtcars, case_when(!!!patterns_lazy)))
#>  [1] ">4"    ">4"    "2<->4" ">4"    ">4"    ">4"    ">4"    "2<->4" "2<->4"
#> [10] ">4"    ">4"    ">4"    ">4"    ">4"    ">4"    ">4"    ">4"    "2<->4"
#> [19] "2<->4" "2<->4" "2<->4" ">4"    ">4"    ">4"    ">4"    "2<->4" "2<->4"
#> [28] "2<->4" ">4"    ">4"    ">4"    "2<->4"


回答2:

Here is one option with ifelse

f1 <- function(data, x){
       x <- enquo(x)       
       f2 <- function(y) ifelse(y <= 2, "<=2", ifelse(y <=4, "2<->4", ">4"))

  data %>%
          mutate( ABC = f2(UQ(x)))
 }

f1(mtcars, cyl) %>%
              head()
#   mpg cyl disp  hp drat    wt  qsec vs am gear carb   ABC
#1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4    >4
#2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4    >4
#3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1 2<->4
#4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    >4
#5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2    >4
#6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1    >4


标签: r dplyr rlang