Pivot using multiple columns

2019-07-19 02:28发布

I have a data set with 5 columns:

store_id    year    event    item    units
123         2015     sale_2   abc      2
234         2015     sale_3   def      1
345         2015     sale_2   xyz      5

I'm trying to rotate out the items by store_id, year, and event to get the sum. For instance

store_id    year    event    abc     def   xyz 
123          2015    sale_2   7       0     0
234          2015    sale_2   2       1     0

I'm having trouble figuring out the best method. Normally I'd use dummyVars in caret to do this but I need sums instead of flag. I've looked at tapply but it can't handle more than 2 grouping variables.

Any other suggestions?

标签: r aggregate
2条回答
Melony?
2楼-- · 2019-07-19 02:34
library(dplyr)
library(tidyr)
data %>%
group_by(store_id, year, event, item) %>%
summarize(N = sum(units)) %>%
spread(item, N)

You can use dplyr to group and summarize and tidyr to spread the data into the desired item columns.

查看更多
我命由我不由天
3楼-- · 2019-07-19 02:51
library(reshape2)
dcast(df, store_id + year + event ~ item, fun.aggregate = sum, value.var='units')
#    store_id year  event abc def xyz
# 1:      123 2015 sale_2   2   0   0
# 2:      234 2015 sale_3   0   1   0
# 3:      345 2015 sale_2   0   0   5

For large datasets consider

# uses dcast.data.table, much faster
library(data.table)
setDT(df)
dcast(df, store_id + year + event ~ item, fun.aggregate = sum, value.var='units') 
查看更多
登录 后发表回答