Calculate relative frequency for a certain group

2020-04-14 08:35发布

I have a data.frame of categorical variables that I have divided into groups and I got the counts for each group.

My original data nyD looks like:

Source: local data frame [7 x 3]
Groups: v1, v2, v3

  v1    v2   v3
1  a  plus  yes
2  a  plus  yes
3  a minus   no
4  b minus  yes
5  b     x  yes
6  c     x notk
7  c     x notk

I performed the following operations using dplyr:

ny1 <- nyD %>% group_by(v1,v2,v3)%>%
           summarise(count=n()) %>%
           mutate(prop = count/sum(count))


My data "ny1" looks like:

Source: local data frame [5 x 5]
Groups: v1, v2

  v1    v2   v3 count prop
1  a minus   no     1    1
2  a  plus  yes     2    1
3  b minus  yes     1    1
4  b     x  yes     1    1
5  c     x notk     2    1

I want to calculate the relative frequency in relation to the V1 Groups in the prop variable. The prop variable should be the corresponding count divided by the "sum of counts for V1 group". V1 group has a total of 3 "a", 2 "b" and 1 "c". That is, ny1$prop[1] <- 1/3, ny1$prop[2] <- 2/3.... The mutate operation where using count/sum(count) is not correct. I need to specify that the sum should be realed only to V1 group. Is there a way to use dplyr to achieve this?

标签： r dplyr

1条回答

女痞

2楼-- · 2020-04-14 09:19

You can do this whole thing in one step (from your original data nyD and without creating ny1). That is because when you'll run mutate after summarise, dplyr will drop one aggregation level (v2) by default (certainly my favorite feature in dplyr) and will aggregate only by v1

nyD %>% 
   group_by(v1, v2) %>%
   summarise(count = n()) %>%
   mutate(prop = count/sum(count))

# Source: local data frame [5 x 4]
# Groups: v1
# 
#   v1    v2 count      prop
# 1  a minus     1 0.3333333
# 2  a  plus     2 0.6666667
# 3  b minus     1 0.5000000
# 4  b     x     1 0.5000000
# 5  c     x     2 1.0000000

Or a shorter version using count (Thanks to @beginneR)

df %>% 
  count(v1, v2) %>% 
  mutate(prop = n/sum(n))

0人赞添加讨论(0) 举报

Calculate relative frequency for a certain group

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间