I am trying to extract from my R dataframe, rows that have duplicate values in one column but which in another column have either a 0 or a 1.
For example, if this is the dataframe:
Data <- data.frame(
+ X = c(1,3,5,7,7,8,9,10,10,11,11),
+ Y = sample(36476545:36476555),
+ timepoint = c(0,1,0,0,1,1,0,1,0,1,1)
+ )
which looks like
> Data
X Y timepoint
1 1 36476549 0
2 3 36476545 1
3 5 36476552 0
4 7 36476547 0
5 7 36476546 1
6 8 36476548 1
7 9 36476551 0
8 10 36476555 1
9 10 36476553 0
10 11 36476554 1
11 11 36476550 1
My desired output will be all rows for which values are duplicated in X with timepoint = 0 for one occurrence of the value and 1 for the other occurrence, resulting in
> Data
X Y timepoint
4 7 36476547 0
5 7 36476546 1
8 10 36476555 1
9 10 36476553 0
Note that the last two items of data, which are also duplicated in X are not counted because the timepoint variable is 1 in both cases. There is a solution in SQL which comes close but I do not know how to code that in R.
Solution I tried by first creating a dataframe of the duplicates and then attempting to get the ones I want from there:
dupes <- Data[Data$X %in%
Data$X[duplicated(Data$X)],]
ids <- Data$X[Data$timepoint==0]
Data[Data$X %in% ids,]
But this returned rows which do not have a duplicate entry. Any help would be appreciated, thank you!