Summarizing data by subgroups [closed]

2019-08-21 10:46发布

My dataset looks like this

Org_ID      Market volume  Indicator variable
1                100              1

1                200              0

1                300              0

2                 50              1

2                500              1

3                400              0

3                200              0

3                300              0

3                100              0

And i want to summarize it by market TRx and org_id by calculating the % of 0 indicator variables in terms of market volume, as follows:

Org_ID   % of 0's by market volume
1   83.3%

2   0%

3   100%

I tried subgroups but can't seem to be able to do this. Can anyone suggest what are some of the ways i can do?

1条回答
Ridiculous、
2楼-- · 2019-08-21 11:52

with dplyr:

library(dplyr)

df %>%
  group_by(Org_ID) %>%
  summarize(sum_market_vol = sum(Market_volume*!Indicator_variable),
            tot_market_vol = sum(Market_volume)) %>%
  transmute(Org_ID, Perc_Market_Vol = 100*sum_market_vol/tot_market_vol)

Result:

# A tibble: 3 x 2
  Org_ID Perc_Market_Vol
   <int>           <dbl>
1      1        83.33333
2      2         0.00000
3      3       100.00000

Data:

df = structure(list(Org_ID = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L, 3L), 
    Market_volume = c(100L, 200L, 300L, 50L, 500L, 400L, 200L, 
    300L, 100L), Indicator_variable = c(1L, 0L, 0L, 1L, 1L, 0L, 
    0L, 0L, 0L)), .Names = c("Org_ID", "Market_volume", "Indicator_variable"
), class = "data.frame", row.names = c(NA, -9L))
查看更多
登录 后发表回答