Filter groups by occurrence of a value

2020-01-28 05:57发布

问题:

How to select groups based on a condition on the individual rows, say filter all groups that contain value 4 (or any other condition).

Let's take a very simple data, with two groups, and I want to select the group B (as has a Value of 4)

library(dplyr)
df <- data.frame(Group=LETTERS[c(1,1,1,2,2,2)], Value=c(1:5,4))

> df
  Group Value

1     A     1
2     A     2
3     B     3
4     B     4

Doing group_by() and then filter (as in this post) will only select individual rows that contains a value of 4, not the whole group:

df %>%
  group_by(Group) %>%
  filter(Value==4)

Group Value
  <fctr> <int>
1      B     4

回答1:

This turns out to be pretty easy: you just need to use the any() function in the filter call. Indeed, it appears that:

  • filter(any(...)) evaluates at the group_by() level,

  • filter(...) evaluates at the rowwise() level, even when preceded by group_by().

Hence use:

 df %>%
    group_by(Group) %>%
    filter(any(Value==4)) 

Group Value
 <fctr> <int>
1      B     3
2      B     4

Interestingly, the same appear with mutate, compare:

df %>%
group_by(Group) %>%
mutate(check1=any(Value==4), 
       check2=Value==4) 

   Group Value check1 check2
  <fctr> <int>  <lgl>  <lgl>
1      A     1  FALSE  FALSE
2      A     2  FALSE  FALSE
3      B     3   TRUE  FALSE
4      B     4   TRUE   TRUE


回答2:

A data.table option is

library(data.table)
setDT(df)[, if(any(Value==4)) .SD, by = Group]
#   Group Value
#1:     B     4
#2:     B     5
#3:     B     4


标签: r dplyr