Is there anyway to pass a string as column reference to a dplyr procedure?
Here is an example - with a grouped dataset and a simple function where I try to pass a string as reference to a column. Thanks!
machines <- data.frame(Date=c("1/31/2014", "1/31/2014", "2/28/2014", "2/28/2014", "3/31/2014", "3/31/2014"),
Model.Num=c("123", "456", "123", "456", "123", "456"),
Cost=c(200, 300, 250, 350, 300, 400))
my.fun <- function(data, colname){
mutate(data, position=cumsum(as.name(colname)))
}
machines <- machines %>% group_by(Date, Model.Num)
machines <- my.fun(machines, "Cost")
Here's an option that uses interp()
from the lazyeval package, which came with your dplyr install. Inside your function(s), you'll need to use the standard evaluation version of the dplyr functions. In this case that would be mutate_()
.
Note that the new column position
will be identical to the Cost
column here because of how you've set up the grouping in machines
. The second call to my_fun()
shows it working on a different set of grouping variables.
library(dplyr)
library(lazyeval)
my_fun <- function(data, col) {
mutate_(data, position = interp(~ cumsum(x), x = as.name(col)))
}
my_fun(machines, "Cost")
# Date Model.Num Cost position
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 250
# 4 2/28/2014 456 350 350
# 5 3/31/2014 123 300 300
# 6 3/31/2014 456 400 400
## second example - different grouping
my_fun(group_by(machines, Model.Num), "Cost")
# Date Model.Num Cost position
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 450
# 4 2/28/2014 456 350 650
# 5 3/31/2014 123 300 750
# 6 3/31/2014 456 400 1050
We can evaluate in standard evaluation without the use of lazyeval
package. We can set some string as variable name by using setNames
.
library(tidyverse)
machines <- data.frame(
Date = c("1/31/2014", "1/31/2014", "2/28/2014", "2/28/2014", "3/31/2014", "3/31/2014"),
Model.Num = c("123", "456", "123", "456", "123", "456"),
Cost = c(200, 300, 250, 350, 300, 400)
)
my_fun <- function(data, col) {
mutate_(data, .dots = setNames(paste0("cumsum(", col, ")"), "position"))
}
my_fun(machines %>% group_by(Date, Model.Num), "Cost")
# Source: local data frame [6 x 4]
# Groups: Date, Model.Num [6]
#
# Date Model.Num Cost position
# <fctr> <fctr> <dbl> <dbl>
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 250
# 4 2/28/2014 456 350 350
# 5 3/31/2014 123 300 300
# 6 3/31/2014 456 400 400
my_fun(machines %>% group_by(Model.Num), "Cost")
# Source: local data frame [6 x 4]
# Groups: Model.Num [2]
#
# Date Model.Num Cost position
# <fctr> <fctr> <dbl> <dbl>
# 1 1/31/2014 123 200 200
# 2 1/31/2014 456 300 300
# 3 2/28/2014 123 250 450
# 4 2/28/2014 456 350 650
# 5 3/31/2014 123 300 750
# 6 3/31/2014 456 400 1050