data.table: Sum by all existing combinations in ta

2019-02-20 04:29发布

I have a data.table out like this (in reality it is much larger):

out <-      code weights group
        1:    2   0.387      1
        2:    1   0.399      1
        3:    2   1.610      1
        4:    3   1.323      2
        5:    2   0.373      2                                            
        6:    1   0.212      2
        7:    3   0.316      3
        8:    2   0.569      3
        9:    1   0.120      3
       10:    1   0.354      3

It has 3 groups with different codes (column 1). In group #1, the code 3 does not appear, while in the other it appears.

Then, I want to sum the weights for every group and code combination . I achieve this with this command:

sum.dt <- out[,.(sum(weights)), by=list(code,group)][order(-V1)]

This works well but it does not have the combination Group 1 with Code 3 because it is not in the out table. I would like to have all possible combinations in sum.dt, and if the combination does not occur in the source table, it should sum up to 0, meaning the column V1 should be 0 in this row.

Any idea how I could achieve this?

2条回答
爱情/是我丢掉的垃圾
2楼-- · 2019-02-20 04:55

Using CJ (cross join) you can add the missing combinations:

library(data.table)
setkey(out, code, group)
out[CJ(code, group, unique = TRUE)
    ][, lapply(.SD, sum), by = .(code, group)
      ][is.na(weights), weights := 0]

gives:

   code group weights
1:    1     1   0.399
2:    1     2   0.212
3:    1     3   0.474
4:    2     1   1.997
5:    2     2   0.373
6:    2     3   0.569
7:    3     1   0.000
8:    3     2   1.323
9:    3     3   0.316

Or with xtabs as @alexis_laz showed in the comments:

xtabs(weights ~ group + code, out)

which gives:

     code
group     1     2     3
    1 0.399 1.997 0.000
    2 0.212 0.373 1.323
    3 0.474 0.569 0.316

If you want to get this output in a long-form dataframe, you can wrap the xtabs code in the melt function of the reshape2 (or data.table) package:

library(reshape2)
res <- melt(xtabs(weights ~ group + code, out))

which gives:

> class(res)
[1] "data.frame"
> res
  group code value
1     1    1 0.399
2     2    1 0.212
3     3    1 0.474
4     1    2 1.997
5     2    2 0.373
6     3    2 0.569
7     1    3 0.000
8     2    3 1.323
9     3    3 0.316

You could also do this with a combination of dplyr and tidyr:

library(dplyr)
library(tidyr)
out %>%
  complete(code, group, fill = list(weights=0)) %>%
  group_by(code, group) %>% 
  summarise(sum(weights))
查看更多
在下西门庆
3楼-- · 2019-02-20 05:06

I had a similar problem, and CJ did not work for some reason. A relatively simple solution I ended up using is first calling dcast and then melt (similar to the xtable solution above)- this also conveniently lets you specify the fill value for the missing combinations.

sum.dt <- dcast(out, code ~ group, value.var = 'weights', 
                fun.aggregate = sum, fill = 0)
sum.dt <- melt(sum.dt, id.vars = 'code', variable.name = 'group')

This gives

> sum.dt
   code group value
1:    1     1 0.399
2:    2     1 1.997
3:    3     1 0.000
4:    1     2 0.212
5:    2     2 0.373
6:    3     2 1.322
7:    1     3 0.474
8:    2     3 0.569
9:    3     3 0.316
查看更多
登录 后发表回答