Subset data frame to include only levels of one fa

2019-01-15 16:50发布

问题:

I am working with a data frame that deals with numeric measurements. Some individuals have been measured several times, both as juveniles and adults. A reproducible example:

ID <- c("a1", "a2", "a3", "a4", "a1", "a2", "a5", "a6", "a1", "a3")
age <- rep(c("juvenile", "adult"), each=5)
size <- rnorm(10)

# e.g. a1 is measured 3 times, twice as a juvenile, once as an adult.
d <- data.frame(ID, age, size)

My goal is to subset that data frame by selecting the IDs that appear at least once as a juvenile and at least once as an adult. Not sure how to do that..?

The resulting dataframe would contain all measurements for individuals a1, a2 and a3, but would exclude a4, a5 and a6, as they were not measured at both stages.

A similar question was asked 7 months ago but never had an answer (Subset data frame to include only levels one factor that have values in both levels of another factor)

Thanks!

回答1:

Here is one option with data.table

library(data.table)
setDT(d)[, .SD[all(c("juvenile", "adult") %in% age)], ID]

Or a base R option with ave

d[with(d, ave(as.character(age), ID, FUN = function(x) length(unique(x)))>1),]
#   ID      age       size
#1  a1 juvenile -1.4545407
#2  a2 juvenile -0.4695317
#3  a3 juvenile  0.2271316
#5  a1 juvenile  0.2961210
#6  a2    adult -0.8331993
#9  a1    adult -0.6924967
#10 a3    adult -0.4619550


回答2:

With dplyr, you can use group_by %>% filter:

library(dplyr)
d %>% group_by(ID) %>% filter(all(c("juvenile", "adult") %in% age))

# A tibble: 7 x 3
# Groups:   ID [3]
#      ID      age       size
#  <fctr>   <fctr>      <dbl>
#1     a1 juvenile -0.6947697
#2     a2 juvenile -0.3665272
#3     a3 juvenile  1.0293555
#4     a1 juvenile  0.2745224
#5     a2    adult  0.5299029
#6     a1    adult  2.2247802
#7     a3    adult -0.4717160


回答3:

split by age, intersect and subset:

d[d$ID %in% Reduce(intersect, split(d$ID, d$age)),]
#   ID      age        size
#1  a1 juvenile  1.44761836
#2  a2 juvenile  1.70098645
#3  a3 juvenile  0.08231986
#5  a1 juvenile  0.91240568
#6  a2    adult -1.77318962
#9  a1    adult  0.13597986
#10 a3    adult -1.18575294