Using dplyr functions within another function

2019-08-31 03:25发布

问题:

I've been struggling with this issue which is quite similar to a question raised here before. Somehow I can't translate the solution given in that question to my own problem.

I start off with making an example data frame:

test.df <- data.frame(col1 = rep(c('a','b'), each=5), col2 = runif(10))
str(test.df)

The following function should create a new data frame with the mean of a "statvar" based on groups of a "groupvar".

test.f <- function(df, groupvar, statvar) {
  df %>% 
    group_by_(groupvar) %>% 
    select_(statvar) %>%
    summarise_(
      avg = ~mean(statvar, na.rm = TRUE)
    )
} 

test.f(df = test.df,
       groupvar = "col1",
       statvar = "col2")

What I would like this to return is a data frame with 2 calculated averages (one for all a values in col1 and one for all b values in col1). Instead I get this:

  col1 avg
1    a  NA
2    b  NA
Warning messages:
1: In mean.default("col2", na.rm = TRUE) :
  argument is not numeric or logical: returning NA
2: In mean.default("col2", na.rm = TRUE) :
  argument is not numeric or logical: returning NA

I find this strange cause I'm pretty sure col2 is numeric:

str(test.df)
'data.frame':   10 obs. of  2 variables:
 $ col1: Factor w/ 2 levels "a","b": 1 1 1 1 1 2 2 2 2 2
 $ col2: num  0.4269 0.1928 0.7766 0.0865 0.1798 ...

回答1:

library(lazyeval)
library(dplyr)

test.f <- function(df, groupvar, statvar) {
  test.df %>% 
    group_by_(groupvar) %>% 
    select_(statvar) %>%
    summarise_(
      avg = (~mean(statvar, na.rm = TRUE)) %>%
        interp(statvar = as.name(statvar))
    )
} 

test.f(df = test.df,
       groupvar = "col1",
       statvar = "col2")

Your issue is that "col2" is being substituted for statvar, and the mean("col2") is undefined



回答2:

With the soon to be released dplyr 0.6.0, new functionality can help. The new function is UQ(), it unquotes what has been quoted. You are entering statvar as a string like "col1". dplyr has alternate functions that can evaluate regularly as in group_by_ and select_. But for summarise_ the alteration of the string can be ugly as in the above answer. We can now use the regular summarise function and unquote the quoted variable name. For more help on what 'unquote the quoted' means, see this vignette. For now the developer's version has it.

library(dplyr)
test.df <- data.frame(col1 = rep(c('a','b'), each=5), col2 = runif(10))
test.f <- function(df, groupvar, statvar) {
  q_statvar <- as.name(statvar)
  df %>% 
    group_by_(groupvar) %>% 
    select_(statvar) %>%
    summarise(
      avg = mean(!!q_statvar, na.rm = TRUE)
    )
} 

test.f(df = test.df,
       groupvar = "col1",
       statvar = "col2")
# # A tibble: 2 × 2
#     col1       avg
#   <fctr>     <dbl>
# 1      a 0.6473072
# 2      b 0.4282954


标签: r dplyr