This question already has an answer here:
How can I pass column names to dplyr if I do not know the column name, but want to specify it through a variable?
e.g. this works:
require(dplyr)
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(group) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))
But this does not
require(dplyr)
someColumn = "group"
df <- as.data.frame(matrix(seq(1:9),ncol=3,nrow=3))
df$group <- c("A","B","A")
gdf <- df %.% group_by(someColumn) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))
You can use summarise_ as follow:
I was trying to ask the same question for my own problem. Then I found a solution to it. I encapsulate the expression with eval(as.symbol()).
I expect you just have to use eval
Here's an answer to this straightforward question, obtained by picking through hadley's solution to his posted dupe.
gdf <- df %.% regroup( lapply( someColumn, as.symbol)) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))
FWIW, my use case involved grouping by one variable column and one constant column. The solution to that is:
gdf <- df %.% regroup( lapply( c( 'constant_column', someColumn), as.symbol)) %.% summarise(m1 =mean(V1),m2 =mean(V2),m3 =mean(V3))
Finally, the posted
eval
solution doesn't work. That just makes a new column whose values are all whatsomeColumn
eval
s to. I'm not yet cool enough to leave a comment or downvote it.I just gave a similar answer over at Group by multiple columns in dplyr, using string vector input, but for good measure: functions that allow you to operate on columns using strings have been added to
dplyr
. These have the same name as the regulardplyr
functions, but end in an underscore. The functions are described in detail in this vignette.Given
df
andsomeColumn
from the OP, this now works a treat:Note that it is
group_by_
, rather thangroup_by
, and the%>%
operator is used as%.%
is deprecated.