selection of observations by combining criteria in

2019-09-16 16:54发布

问题:

This topic has probably been brought up and it is a quite simpe solution , i guess. However i couldnt make it up to now. Lets say i have a data.frame (called "data") which contains 10 individuals (id) on which i collected observations at 3 time points (T)

> data <- data.frame(id = rep(c(1:10), 3),
                     T  = gl(3, 10),
                     X  = sample(1:30),
                     Y  = sample(c("yes", "no"), 30, replace = TRUE),
                     Z  = sample(1:40, 30),
                     Z2 = rnorm(30, mean = 5, sd = 0.5))

    > head(data)
      id T  X   Y  Z       Z2
    1  1 1 10 yes 15 5.993605
    2  2 1 18  no 22 6.096566
    3  3 1  5  no 24 5.101393
    4  4 1 15 yes 18 4.944108
    5  5 1 23  no 34 4.634176
    6  6 1 13  no 27 5.576015

I would like to create a subset of this data.frame (an new data.frame called data2) by selecting only individuals that have "yes" (variable Y) for each of the three time points (variable T), that means Y="yes" for T=1 and T=2 and T=3.

I know that combining conditions can be achieved by using the "&" sign, and this can be used to relate conditions for the 3 time points. However, my problem is to write each condition for each time point : how to tell R that i want subjects for which Y="yes" at T="1" for example ?

Thank you very much in advance to all. Have a great day,

Denis

回答1:

You can do:

keep.ids <- tapply(data$Y, data$id, FUN = function(x)all(x == "yes"))
subset(data, keep.ids[factor(id)])

Or use the plyr package:

library(plyr)
ddply(data, "id", function(x) if(all(x$Y == "yes")) x else NULL)


标签: r selection