One of the great things about pivot tables in excel is that they provide subtotals automatically. First, I would like to know if there is anything already created within dplyr that can accomplish this. If not, what is the easiest way to achieve it?
In the example below, I show the mean displacement by number of cylinders and carburetors. For each group of cylinders (4,6,8), I'd like to see the mean displacement for the group (or total displacement, or any other summary statistic).
library(dplyr)
mtcars %>% group_by(cyl,carb) %>% summarize(mean(disp))
cyl carb mean(disp)
1 4 1 91.38
2 4 2 116.60
3 6 1 241.50
4 6 4 163.80
5 6 6 145.00
6 8 2 345.50
7 8 3 275.80
8 8 4 405.50
9 8 8 301.00
Here is a simple one-liner creating margins within a data_frame:
output:
You may also add labels for the summary statistics like:
output:
Something similar to
table
withaddmargins
(although actually adata.frame
)The bottom row is the column wise margins, columns named 1:8 are carbs, and Total is the rowwise margins.
Also possible by simply joining the two group results:
gives:
or with an additional column:
gives:
data.table It's very clunky, but this is one way:
This gives
I'd rather see results in something like an R
table
, but don't know of any functions for that.dplyr @akrun found this analogous code
We could wrap the repeat operations in a function
Either option can be made a little less ugly with data.table's
rbindlist
withfill
:You can use this wrapper around
ddply
, which appliesddply
for each possible margin andrbinds
the results with its usual output.To marginalize over all grouping factors:
To marginalize over
carb
only:Wrapper:
I know that this may not be a very elegant solution, but I hope it helps anyway:
(Edit: After a minor modification in the definition of
p
this now yields the same results as @Frank's and @akrun's solution)