I have a data.frame of categorical variables that I have divided into groups and I got the counts for each group.
My original data nyD looks like:
Source: local data frame [7 x 3]
Groups: v1, v2, v3
v1 v2 v3
1 a plus yes
2 a plus yes
3 a minus no
4 b minus yes
5 b x yes
6 c x notk
7 c x notk
I performed the following operations using dplyr:
ny1 <- nyD %>% group_by(v1,v2,v3)%>%
summarise(count=n()) %>%
mutate(prop = count/sum(count))
My data "ny1" looks like:
Source: local data frame [5 x 5]
Groups: v1, v2
v1 v2 v3 count prop
1 a minus no 1 1
2 a plus yes 2 1
3 b minus yes 1 1
4 b x yes 1 1
5 c x notk 2 1
I want to calculate the relative frequency in relation to the V1 Groups in the prop variable. The prop variable should be the corresponding count divided by the "sum of counts for V1 group". V1 group has a total of 3 "a", 2 "b" and 1 "c". That is, ny1$prop[1] <- 1/3, ny1$prop[2] <- 2/3.... The mutate operation where using count/sum(count) is not correct. I need to specify that the sum should be realed only to V1 group. Is there a way to use dplyr to achieve this?
You can do this whole thing in one step (from your original data
nyD
and without creatingny1
). That is because when you'll runmutate
aftersummarise
,dplyr
will drop one aggregation level (v2
) by default (certainly my favorite feature indplyr
) and will aggregate only byv1
Or a shorter version using
count
(Thanks to @beginneR)