I have a data.frame of categorical variables that I have divided into groups and I got the counts for each group.
My original data nyD looks like:
Source: local data frame [7 x 3]
Groups: v1, v2, v3
v1 v2 v3
1 a plus yes
2 a plus yes
3 a minus no
4 b minus yes
5 b x yes
6 c x notk
7 c x notk
I performed the following operations using dplyr:
ny1 <- nyD %>% group_by(v1,v2,v3)%>%
summarise(count=n()) %>%
mutate(prop = count/sum(count))
My data "ny1" looks like:
Source: local data frame [5 x 5]
Groups: v1, v2
v1 v2 v3 count prop
1 a minus no 1 1
2 a plus yes 2 1
3 b minus yes 1 1
4 b x yes 1 1
5 c x notk 2 1
I want to calculate the relative frequency in relation to the V1 Groups in the prop variable. The prop variable should be the corresponding count divided by the "sum of counts for V1 group". V1 group has a total of 3 "a", 2 "b" and 1 "c". That is, ny1$prop[1] <- 1/3, ny1$prop[2] <- 2/3.... The mutate operation where using count/sum(count) is not correct. I need to specify that the sum should be realed only to V1 group. Is there a way to use dplyr to achieve this?