What would be the best tool/package to use to calculate proportions by subgroups? I thought I could try something like this:
data(mtcars)
library(plyr)
ddply(mtcars, .(cyl), transform, Pct = gear/length(gear))
But the output is not what I want, as I would want something with a number of rows equal to cyl
. Even if change it to summarise
i still get the same problem.
I am open to other packages, but I thought plyr
would be best as I would eventually like to build a function around this. Any ideas?
I'd appreciate any help just solving a basic problem like this.
To get frequency within a group:
or equivalently,
Careful of what the grouping is at each stage, or your numbers will be off.
See
?count
, basically,count
is a wrapper forsummarise
withn()
but it does the group by for you. Look at the output of justmtcars %>% count(cyl, gear)
. Then, we add an additional variable withmutate
namedprop
which is the result of callingprop.table()
on then
variable we created after as a result ofcount(cyl, gear)
.You could create this as a function using the
SE
versions ofcount()
, that iscount_()
. Look at thevignette
forNon-Standard Evaluation
in thedplyr
package.Here's a nice github gist addressing lots of cross-tabulation variants with
dplyr
and other packages.