I have a data frame that contains multiple subjects (id
), with repeated observations (recorded at times time
). Each of the times may or may not be associated with an event (event
). An example data frame can be generated with:
set.seed(12345)
id <- c(rep(1, 9), rep(2, 9), rep(3, 9))
time <- c(seq(from = 0, to = 96, by = 12),
seq(from = 0, to = 80, by = 10),
seq(from = 0, to = 112, by = 14))
random <- runif(n = 27)
event <- rep(100, 27)
df <- data.frame(cbind(id, time, event, random))
df$event <- ifelse(df$random < 0.55, 0, df$event)
df <- subset(df, select = -c(random))
df$event <- ifelse(df$time == 0, 100, df$event)
I would like to calculate the time between events (tae
[time after the last event]), such that the ideal output would look like:
head(ideal_df)
id time event tae
1 1 0 100 0
2 1 12 100 0
3 1 24 100 0
4 1 36 100 0
5 1 48 0 12
6 1 60 0 24
In fortran, I use the following code to create the tae
variable:
IF(EVENT.GT.0) THEN
TEVENT = TIME
TAE = 0
ENDIF
IF(EVENT.EQ.0) THEN
TAE = TIME - TEVENT
ENDIF
In R, I have attempted both an ifelse
and dplyr
solution. However, neither produce my desired output.
# Calculate the time since last event (using ifelse)
df$tae <- ifelse(df$event >= 0, df$tevent = df$time & df$tae = 0, df$tae = df$time - df$tevent)
Error: unexpected '=' in "df$tae <- ifelse(df$event >= 0, df$tevent ="
# Calculate the time since last event (using dplyr)
res <- df %>%
arrange(id, time) %>%
group_by(id) %>%
mutate(tae = time - lag(time))
res
id time event tae
1 1 0 100 NA
2 1 12 100 12
3 1 24 100 12
4 1 36 100 12
5 1 48 0 12
6 1 60 0 12
Clearly, neither of these yield my desired output. It appears as though assigning variables within the ifelse
function is not well tolerated by R. My attempt at a dplyr
solution also fails to account for the event
variable...
Lastly, another variable that recorded the time until the next event tue
will be needed. If anyone happens to have a thought regarding how best to go about this (perhaps more tricky) calculation, please feel free to share.
Any thoughts regarding how to get one of these working (or an alternative solution) would be greatly appreciated. Thanks!
P.S. -- A reproducible example when the interval between events changes within an ID
is presented below:
id <- rep(1, 9)
time <- c(0, 10, 22, 33, 45, 57, 66, 79, 92)
event <- c(100, 0, 0, 100, 0, 100, 0, 0, 100)
df <- data.frame(cbind(id, time, event))
head(df)
id time event
1 1 0 100
2 1 10 0
3 1 22 0
4 1 33 100
5 1 45 0
6 1 57 100
I guess you might be impressed by the compactness of dplyr, but going through a lot of unnecessary calculations really hurts your time performance...
Here's an approach with
dplyr
:The new columns include time after event (
tae
) and time before event (tbe
).The result:
The result with the second example:
You were very close with your
dplyr
implementation. Try thisI can't think of a way to vectorize it right now, but here's a loop that should be decently quick (O(n)).