I'm trying to transfer my understanding of plyr into dplyr, but I can't figure out how to group by multiple columns.
# make data with weird column names that can't be hard coded
data = data.frame(
asihckhdoydkhxiydfgfTgdsx = sample(LETTERS[1:3], 100, replace=TRUE),
a30mvxigxkghc5cdsvxvyv0ja = sample(LETTERS[1:3], 100, replace=TRUE),
value = rnorm(100)
)
# get the columns we want to average within
columns = names(data)[-3]
# plyr - works
ddply(data, columns, summarize, value=mean(value))
# dplyr - raises error
data %.%
group_by(columns) %.%
summarise(Value = mean(value))
#> Error in eval(expr, envir, enclos) : index out of bounds
What am I missing to translate the plyr example into a dplyr-esque syntax?
Edit 2017: Dplyr has been updated, so a simpler solution is available. See the currently selected answer.
Since this question was posted, dplyr added scoped versions of
group_by
(documentation here). This lets you use the same functions you would use withselect
, like so:The output from your example question is as expected (see comparison to plyr above and output below):
Note that since
dplyr::summarize
only strips off one layer of grouping at a time, you've still got some grouping going on in the resultant tibble (which can sometime catch people by suprise later down the line). If you want to be absolutely safe from unexpected grouping behavior, you can always add%>% ungroup
to your pipeline after you summarize.Just so as to write the code in full, here's an update on Hadley's answer with the new syntax:
output:
Until dplyr has full support for string arguments, perhaps this gist is useful:
https://gist.github.com/skranz/9681509
It contains bunch of wrapper functions like s_group_by, s_mutate, s_filter, etc that use string arguments. You can mix them with the normal dplyr functions. For example
One (tiny) case that is missing from the answers here, that I wanted to make explicit, is when the variables to group by are generated dynamically midstream in a pipeline:
This basically shows how to use
grep
in conjunction withgroup_by_(.dots = ...)
to achieve this.General example on using the
.dots
argument as character vector input to thedplyr::group_by
function :Or without a hard coded name for the grouping variable (as asked by the OP):
With the example of the OP:
See also the dplyr vignette on programming which explains pronouns, quasiquotation, quosures, and tidyeval.