id <- c(1:8,1:8)
age1 <- c(7.5,6.7,8.6,9.5,8.7,6.3,9,5)
age2 <- age1 + round(runif(1,1,3),1)
age <- c(age1, age2)
tanner <- sample(1:2, 16,replace=T)
df <- data.frame(id,age,tanner)
id age tanner
1 1 7.5 2
2 2 6.7 1
3 3 8.6 2
4 4 9.5 2
5 5 8.7 1
6 6 6.3 1
7 7 9.0 1
8 8 5.0 1
9 1 10.0 1
10 2 9.2 1
11 3 11.1 1
12 4 12.0 2
13 5 11.2 2
14 6 8.8 2
15 7 11.5 1
16 8 7.5 1
Above is a sample data frame. I would like to convert it into the below format.
id age at tanner=1 age at tanner=2
1 10 7.5
2 6.7 NA
3 11.1 8.6
4 NA 9.5
...
If at both ages, the tanner record is the same, I want it to keep the younger age.
For example,
id age tanner
2 6.7 1
2 9.2 1
In this case, 6.7 will be kept for id=2 in the new dataset.
aggregate
thenreshape
(using a copied and pasted version of yourdf
rather than your code, that doesn't match):A little
dplyr
andtidyr
does the trick here.arrange
by age so lowest age appear first then use afilter
for duplicated id/tanner then utilizetidyr::spread
We can use
dcast
to convert from 'long' to 'wide' and use thefun.aggregate
asmin
. Here I converted the 'data.frame' to 'data.table' (setDT(df)
) as thedcast
fromdata.table
would be fast.If we want to change the 'Inf' to 'NA'