Groupwise computation in R [duplicate]

2019-07-18 08:34发布

问题:

This question already has an answer here:

  • Grouping functions (tapply, by, aggregate) and the *apply family 9 answers

I have grouped and summarized a data frame in R so that I now have a table like:

Group | Value | Count
==========================
   A  |   1   |   4
   A  |   2   |   2
   A  |   10  |   4
   B  |   3   |   2
   B  |   4   |   4
   B  |   2   |   3
   C  |   5   |   3
   C  |   2   |   6

I am interested in finding out the relative frequency of the value 2 within each group:

Group | Relative freq of 2
==========================
   A  |  2/(4+2+4) = 0.2
   B  |  3/(2+4+3) = 0.33
   C  |  6/(3+6) = 0.67

Is there a simple, elegant way of calculating this in R, other than writing a bunch of code with loops and conditionals? Possibly using dplyr.

回答1:

Using dplyr, after grouping by 'Group', we subset the 'Count' where 'Value' is 2 (assuming there is only a single 'Value' of 2 per each 'Group') and divide by the sum of 'Count'

library(dplyr)
df1 %>%
   group_by(Group) %>% 
   summarise(RelFreq = round(Count[Value==2]/sum(Count), 2))
#  Group RelFreq
#  <chr>   <dbl>
#1     A    0.20
#2     B    0.33
#3     C    0.67

The corresponding data.table option is

library(data.table)
setDT(df1)[, .(RelFreq = round(Count[Value == 2]/sum(Count),2)), by = Group]


回答2:

Here is a base R solution:

sapply(split(df1, df1$Group), 
   function(x) round(sum(x$Count[x$Value == 2]) / sum(x$Count), 2))

##  A    B    C 
## 0.20 0.33 0.67 


回答3:

You can use the same logic using for loops

for(i in unique(df$Group)){
  df$Relative_freq_of_2[df$Group==i] <- round(df$Count[df$Value==2 & df$Group==i]/sum(df$Count[df$Group==i]),2)
}

df <- unique(df[,c("Group","Relative_freq_of_2")])

Group Relative_freq_of_2
    A               0.20
    B               0.33
    C               0.67


回答4:

This one with sqldf:

library(sqldf)
df1 <- sqldf('select `Group`,`Count` from df where Value=2')
df2 <- sqldf('select `Group`, sum(`Count`) as `Count` from df group by `Group`')
df1$Count <- df1$Count / df2$Count
df1
Group     Count
1     A 0.2000000
2     B 0.3333333
3     C 0.6666667