问题:

I'm trying to filter out whole rows in R, but only if the frequencies for a particular set don't add up to more than 5.

The data I have looks a bit like this. It's a dataframe that I'm currently calling "Words":

HEADWORD VARIANT FREQUENCY
 SWORD    sword      2
 SWORD    swerd      1
 SWORD    sworde     1
 KNIGHT   knight     6
 KNIGHT   kniht      2
 KNIGHT   knyt       1

I only want rows for which the frequencies within a particular headword add up to more than 5. So here, I want to keep all the instances of KNIGHT but I want to get rid of all the SWORD rows entirely.

I tried to do this on dplyr, but with no success. This is the code I tried:

Words1 %>% group_by(HW) %>%  filter(Fr > 5)

I'm out of ideas as to how else to do it and I'd really appreciate any help!

回答1:

We need to get the sum of 'FREQUENCY' and check whether it is greater than 5 in the filter after grouping by 'HEADWORD'

Words1 %>% 
     group_by(HEADWORD) %>% 
     filter(sum(FREQUENCY) >5)   
#   HEADWORD VARIANT FREQUENCY
#     <chr>   <chr>     <int>
#1   KNIGHT  knight         6
#2   KNIGHT   kniht         2 
#3   KNIGHT    knyt         1

回答2:

You can use base R ave function

df[ave(df$FREQUENCY, df$HEADWORD, FUN = sum) > 5, ]

#   HEADWORD VARIANT FREQUENCY
#4   KNIGHT  knight         6
#5   KNIGHT   kniht         2
#6   KNIGHT    knyt         1