dynamic dplyr column name calculation

2020-07-30 01:57发布

问题:

I have the following code.

colName is passed in. I've been trying to get it to be evaluated as the value of colName but have not had much success. I've tried "eval", "setNames", etc. Using the "_", still has not provided success.

Essentially, if my colName = "MyCol", I want the dplyr chain to execute as if the last line read:

mutate(MyCol = ifelse(is.na(MyCol), "BLANK", MyCol))

makeSummaryTable <- function(colName,originalData){
  result <- originalData %>% 
    group_by_(colName) %>% 
    summarise(numObs = n()) %>% 
    ungroup() %>% 
    arrange(desc(numObs)) %>% 
    rowwise() %>% 
    mutate_(colName = ifelse(is.na(colName), "BLANK",colName))
  return(result)
}

回答1:

Here's how to do it with dplyr 0.6.0 using the new tidyeval approach to non-standard evaluation. (I'm not sure if it's even possible to do with standard evaluation, at least in a straightforward manner):

library(dplyr)

makeSummaryTable <- function(colName, originalData){

  colName <- enquo(colName)

  originalData %>% 
    count(!!colName) %>% 
    arrange(desc(n)) %>%
    mutate(
      old_col = !!colName,
      !!quo_name(colName) := if_else(is.na(!!colName), "BLANK",!!colName)
      )
}

makeSummaryTable(hair_color, starwars)
#> # A tibble: 13 x 3
#>       hair_color     n       old_col
#>            <chr> <int>         <chr>
#>  1          none    37          none
#>  2         brown    18         brown
#>  3         black    13         black
#>  4         BLANK     5          <NA>
#>  5         white     4         white
#>  6         blond     3         blond
#>  7        auburn     1        auburn
#>  8  auburn, grey     1  auburn, grey
#>  9 auburn, white     1 auburn, white
#> 10        blonde     1        blonde
#> 11   brown, grey     1   brown, grey
#> 12          grey     1          grey
#> 13       unknown     1       unknown

enquo turns the unquoted column name into some fancy object called a quosure. !! then unquotes the quosure so that it can get evaluated as if it would be typed directly in the function. For a more in-depth and accurate explanation, see Hadley's "Programming with dplyr".

EDIT: I realized that the original question was to name the new column with the user-supplied value of colName and not just colName so I updated my answer. To accomplish that, the quosure needs to be turned into a string (or label) using quo_name. Then, it can be "unquoted" using !! just as a regular quosure would be. The only caveat is that since R can't make head or tails of the expression mutate(!!foo = bar), tidyeval introduces the new definition operator := (which might be familiar to users from data.table where it has a somewhat different use). Unlike the traditional assignment operator =, the := operator allows unquoting on both the right-hand and left-hand side.

(updated the answer to use a dataframe that has NA in one of its rows, to illustrate that the last mutate works. I also used count instead of group by + summarize, and I dropped the unnecessary rowwise.)



标签: r dynamic dplyr