This question already has an answer here:
-
Grouping functions (tapply, by, aggregate) and the *apply family
9 answers
I have grouped and summarized a data frame in R so that I now have a table like:
Group | Value | Count
==========================
A | 1 | 4
A | 2 | 2
A | 10 | 4
B | 3 | 2
B | 4 | 4
B | 2 | 3
C | 5 | 3
C | 2 | 6
I am interested in finding out the relative frequency of the value 2 within each group:
Group | Relative freq of 2
==========================
A | 2/(4+2+4) = 0.2
B | 3/(2+4+3) = 0.33
C | 6/(3+6) = 0.67
Is there a simple, elegant way of calculating this in R, other than writing a bunch of code with loops and conditionals? Possibly using dplyr.
Using dplyr
, after grouping by 'Group', we subset the 'Count' where 'Value' is 2 (assuming there is only a single 'Value' of 2 per each 'Group') and divide by the sum
of 'Count'
library(dplyr)
df1 %>%
group_by(Group) %>%
summarise(RelFreq = round(Count[Value==2]/sum(Count), 2))
# Group RelFreq
# <chr> <dbl>
#1 A 0.20
#2 B 0.33
#3 C 0.67
The corresponding data.table
option is
library(data.table)
setDT(df1)[, .(RelFreq = round(Count[Value == 2]/sum(Count),2)), by = Group]
Here is a base R solution:
sapply(split(df1, df1$Group),
function(x) round(sum(x$Count[x$Value == 2]) / sum(x$Count), 2))
## A B C
## 0.20 0.33 0.67
You can use the same logic using for loops
for(i in unique(df$Group)){
df$Relative_freq_of_2[df$Group==i] <- round(df$Count[df$Value==2 & df$Group==i]/sum(df$Count[df$Group==i]),2)
}
df <- unique(df[,c("Group","Relative_freq_of_2")])
Group Relative_freq_of_2
A 0.20
B 0.33
C 0.67
This one with sqldf:
library(sqldf)
df1 <- sqldf('select `Group`,`Count` from df where Value=2')
df2 <- sqldf('select `Group`, sum(`Count`) as `Count` from df group by `Group`')
df1$Count <- df1$Count / df2$Count
df1
Group Count
1 A 0.2000000
2 B 0.3333333
3 C 0.6666667