I have done a bunch of searching for a solution to this and either can't find one or don't know it when I see it. I've seen some topics that are close to this but deal with matching between two different dataframes, whereas this is dealing with a single dataframe.
I have a dataframe with two groups (factors, col1) and a sampling date (date, col2), and then the measurement (numeric, col3). I would like to eventually run a statistical test on a paired sample between group A and B, so in order to create the paired sample, I want to only keep the records that have a measurement taken on the same day for both groups. In other words, remove the records in group A that do not have a corresponding measurement taken on the same day in group B, and vice versa. In the sample data below, that would result in rows 4 and 8 being removed. Another way of thinking of it is, how do I search for and remove records with only one occurrence of each date?
Sample data:
my.df <- data.frame(col1 = as.factor(c(rep("A", 4), rep("B", 4))),
col2 = as.Date(c("2001-01-01", "2001-01-02", "2001-01-03",
"2001-01-04", "2001-01-01", "2001-01-02", "2001-01-03",
"2001-02-03")),
col3 = sample(8))
Here are a few alternatives:
1) ave
2) split / Filter / do.call
3) dplyr (2) translates nearly directly into a dplyr solution:
4) data.table The last two solutions can also be translated to data.table
5) tapply
6) merge
7) sqldf (6) is similar to the following sqldf solution: