由子组汇总数据[关闭](Summarizing data by subgroups [closed]

2019-10-29 22:00发布

我的数据集看起来像这样

Org_ID      Market volume  Indicator variable
1                100              1

1                200              0

1                300              0

2                 50              1

2                500              1

3                400              0

3                200              0

3                300              0

3                100              0

我想通过市场TRx的总结,并在市场销量上计算0 指示变量的%org_id,如下所示:

Org_ID   % of 0's by market volume
1   83.3%

2   0%

3   100%

我试过分组,但似乎没有能够做到这一点。 任何人都可以提出一些的方式我能做些什么?

Answer 1:

dplyr

library(dplyr)

df %>%
  group_by(Org_ID) %>%
  summarize(sum_market_vol = sum(Market_volume*!Indicator_variable),
            tot_market_vol = sum(Market_volume)) %>%
  transmute(Org_ID, Perc_Market_Vol = 100*sum_market_vol/tot_market_vol)

结果:

# A tibble: 3 x 2
  Org_ID Perc_Market_Vol
   <int>           <dbl>
1      1        83.33333
2      2         0.00000
3      3       100.00000

数据:

df = structure(list(Org_ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L), 
    Market_volume = c(100L, 200L, 300L, 50L, 500L, 400L, 200L, 
    300L, 100L), Indicator_variable = c(1L, 0L, 0L, 1L, 1L, 0L, 
    0L, 0L, 0L)), .Names = c("Org_ID", "Market_volume", "Indicator_variable"
), class = "data.frame", row.names = c(NA, -9L))


文章来源: Summarizing data by subgroups [closed]