Sum by two variables

2020-05-10 11:45发布

问题:

I have one dataframe:

       Date  area      sales
1     201204 shanghai    23
2     201204 beijing     25
3     201204 beijing     16
4     201205 shanghai    55
5     201205 beijing     17
6     201205 shanghai    16

What I want to output is a table as follows:

Date   shanghai  beijing 
201204  23        41
201205  71        17

How would I do this in R?

回答1:

In base R (for sum) there's xtabs:

> xtabs(sales ~ Date + area, mydf)
        area
Date     beijing shanghai
  201204      41       23
  201205      17       71

To get it as a data.frame, wrap it in as.data.frame.matrix.


To update this with the approach that is making the rounds these days, you can also use a combination of "dplyr" (for aggregation) and "tidyr" (for reshaping), like this:

library(tidyr)
library(dplyr)
mydf %>% 
  group_by(Date, area) %>% 
  summarise(sales = sum(sales)) %>% 
  spread(area, sales)
# Source: local data frame [2 x 3]
# 
#     Date beijing shanghai
# 1 201204      41       23
# 2 201205      17       71


回答2:

This is cannon fodder for reshape2::dcast

library(reshape2)
# assuming your data is called `D`
dcast(Date~area, value.var = 'sales', fun.aggregate = sum, data = D)


标签: r sum