dplyr string as column reference

2019-01-15 15:00发布

问题:

Is there anyway to pass a string as column reference to a dplyr procedure?

Here is an example - with a grouped dataset and a simple function where I try to pass a string as reference to a column. Thanks!

machines <- data.frame(Date=c("1/31/2014", "1/31/2014", "2/28/2014", "2/28/2014", "3/31/2014", "3/31/2014"), 
            Model.Num=c("123", "456", "123", "456", "123", "456"), 
            Cost=c(200, 300, 250, 350, 300, 400))

my.fun <- function(data, colname){
    mutate(data, position=cumsum(as.name(colname)))
}

machines <- machines %>% group_by(Date, Model.Num)     
machines <- my.fun(machines, "Cost")

回答1:

Here's an option that uses interp() from the lazyeval package, which came with your dplyr install. Inside your function(s), you'll need to use the standard evaluation version of the dplyr functions. In this case that would be mutate_().

Note that the new column position will be identical to the Cost column here because of how you've set up the grouping in machines. The second call to my_fun() shows it working on a different set of grouping variables.

library(dplyr)
library(lazyeval)

my_fun <- function(data, col) {
    mutate_(data, position = interp(~ cumsum(x), x = as.name(col)))
}

my_fun(machines, "Cost")
#        Date Model.Num Cost position
# 1 1/31/2014       123  200      200
# 2 1/31/2014       456  300      300
# 3 2/28/2014       123  250      250
# 4 2/28/2014       456  350      350
# 5 3/31/2014       123  300      300
# 6 3/31/2014       456  400      400

## second example - different grouping
my_fun(group_by(machines, Model.Num), "Cost")
#        Date Model.Num Cost position
# 1 1/31/2014       123  200      200
# 2 1/31/2014       456  300      300
# 3 2/28/2014       123  250      450
# 4 2/28/2014       456  350      650
# 5 3/31/2014       123  300      750
# 6 3/31/2014       456  400     1050


回答2:

We can evaluate in standard evaluation without the use of lazyeval package. We can set some string as variable name by using setNames.

library(tidyverse)

machines <- data.frame(
  Date = c("1/31/2014", "1/31/2014", "2/28/2014", "2/28/2014", "3/31/2014", "3/31/2014"), 
  Model.Num = c("123", "456", "123", "456", "123", "456"), 
  Cost = c(200, 300, 250, 350, 300, 400)
  )

my_fun <- function(data, col) {
  mutate_(data, .dots = setNames(paste0("cumsum(", col, ")"), "position"))
}

my_fun(machines %>% group_by(Date, Model.Num), "Cost")
# Source: local data frame [6 x 4]
# Groups: Date, Model.Num [6]
# 
# Date Model.Num  Cost position
# <fctr>    <fctr> <dbl>    <dbl>
# 1 1/31/2014       123   200      200
# 2 1/31/2014       456   300      300
# 3 2/28/2014       123   250      250
# 4 2/28/2014       456   350      350
# 5 3/31/2014       123   300      300
# 6 3/31/2014       456   400      400
my_fun(machines %>% group_by(Model.Num), "Cost")
# Source: local data frame [6 x 4]
# Groups: Model.Num [2]
# 
# Date Model.Num  Cost position
# <fctr>    <fctr> <dbl>    <dbl>
# 1 1/31/2014       123   200      200
# 2 1/31/2014       456   300      300
# 3 2/28/2014       123   250      450
# 4 2/28/2014       456   350      650
# 5 3/31/2014       123   300      750
# 6 3/31/2014       456   400     1050


标签: r dplyr