I would like to subset an unbalanced panel data set by group. For each group, I would like to keep the two observations in the first and the last years.
How do I best do this in R? For example:
dt <- data.frame(name= rep(c("A", "B", "C"), c(3,2,3)),
year=c(2001:2003,2000,2002,2000:2001,2003))
> dt
name year
1 A 2001
2 A 2002
3 A 2003
4 B 2000
5 B 2002
6 C 2000
7 C 2001
8 C 2003
What I would like to have:
name year
1 A 2001
3 A 2003
4 B 2000
5 B 2002
6 C 2000
8 C 2003
dplyr should help. check out first() & last() to get the values you are looking for and then filter based on those values.
*your example to didn't mention any specific order but it that case, arrange() will help
Here's a quick possible
data.table
solutionOr if you only have two columns
This is pretty simple using
by
to split the data.frame by group and then return the head and tail of each group.head
andtail
are convenient, but slow, so a slightly different alternative would probably be faster on a large data.frame: