Calculate elapsed time since last event

I have a data frame that contains multiple subjects (id), with repeated observations (recorded at times time). Each of the times may or may not be associated with an event (event). An example data frame can be generated with:

set.seed(12345)
id <- c(rep(1, 9), rep(2, 9), rep(3, 9))
time <- c(seq(from = 0, to = 96, by = 12),
      seq(from = 0, to = 80, by = 10),
      seq(from = 0, to = 112, by = 14))
random <- runif(n = 27)
event <- rep(100, 27)

df <- data.frame(cbind(id, time, event, random))
df$event <- ifelse(df$random < 0.55, 0, df$event)
df <- subset(df, select = -c(random))
df$event <- ifelse(df$time == 0, 100, df$event)

I would like to calculate the time between events (tae [time after the last event]), such that the ideal output would look like:

head(ideal_df)
  id time event tae
1  1    0   100   0
2  1   12   100   0
3  1   24   100   0
4  1   36   100   0
5  1   48     0  12
6  1   60     0  24

In fortran, I use the following code to create the tae variable:

IF(EVENT.GT.0) THEN
  TEVENT = TIME
  TAE = 0
ENDIF

IF(EVENT.EQ.0) THEN
  TAE = TIME - TEVENT
ENDIF

In R, I have attempted both an ifelse and dplyr solution. However, neither produce my desired output.

# Calculate the time since last event (using ifelse)
df$tae <- ifelse(df$event >= 0, df$tevent = df$time & df$tae = 0, df$tae = df$time - df$tevent)

Error: unexpected '=' in "df$tae <- ifelse(df$event >= 0, df$tevent ="

# Calculate the time since last event (using dplyr)
res <- df %>%
  arrange(id, time) %>%
  group_by(id) %>%
  mutate(tae = time - lag(time))
res 

   id time event tae
1   1    0   100  NA
2   1   12   100  12
3   1   24   100  12
4   1   36   100  12
5   1   48     0  12
6   1   60     0  12

Clearly, neither of these yield my desired output. It appears as though assigning variables within the ifelse function is not well tolerated by R. My attempt at a dplyr solution also fails to account for the event variable...

Lastly, another variable that recorded the time until the next event tue will be needed. If anyone happens to have a thought regarding how best to go about this (perhaps more tricky) calculation, please feel free to share.

Any thoughts regarding how to get one of these working (or an alternative solution) would be greatly appreciated. Thanks!

P.S. -- A reproducible example when the interval between events changes within an ID is presented below:

id <- rep(1, 9)
time <- c(0, 10, 22, 33, 45, 57, 66, 79, 92)
event <- c(100, 0, 0, 100, 0, 100, 0, 0, 100)
df <- data.frame(cbind(id, time, event))

head(df)
  id time event
1  1    0   100
2  1   10     0
3  1   22     0
4  1   33   100
5  1   45     0
6  1   57   100

标签： r if-statement time dplyr

4条回答

啃猪蹄的小仙女

2楼-- · 2019-02-10 23:39

I guess you might be impressed by the compactness of dplyr, but going through a lot of unnecessary calculations really hurts your time performance...

> loopfun <- function(df){
+ 
+   event <- (df$event == 100)
+   lasttime <- 0
+ 
+   time <- df$time
+   tae <- rep(0, nrow(df))
+ 
+   for(i in 1:nrow(df)){
+ 
+     if(event[i]){
+ 
+       lasttime <- time[i]
+ 
+     }else{
+ 
+       tae[i] <- time[i] - lasttime
+ 
+     }
+ 
+   }
+ 
+   df$tae <- tae
+ 
+   return(df)
+ }
> 
> dplyrfun <- function(df){
+   
+   return(df %>%
+     mutate(tmp = c(0, diff(time)) * !event,
+            tmp2 = cumsum(c(FALSE, as.logical(diff(event))))) %>%
+     group_by(tmp2) %>%
+     mutate(tae = cumsum(tmp)) %>%
+     select(-tmp, -tmp2)
+   )
+   
+ }
> 
> microbenchmark(loopfun(df), dplyrfun(df), times = 10000)
Unit: microseconds
         expr      min       lq       mean   median       uq      max neval
  loopfun(df)   57.356   70.035   95.89365   82.109   96.599 49001.19 10000
 dplyrfun(df) 1494.564 1625.274 1875.85263 1705.722 1877.336 50087.32 10000

0人赞添加讨论(0) 举报

叼着烟拽天下

3楼-- · 2019-02-10 23:41

Here's an approach with dplyr:

library(dplyr)
df %>%
  mutate(tmpG = cumsum(c(FALSE, as.logical(diff(event))))) %>%
  group_by(id) %>%
  mutate(tmp_a = c(0, diff(time)) * !event,
         tmp_b = c(diff(time), 0) * !event) %>%
  group_by(tmpG) %>%
  mutate(tae = cumsum(tmp_a),
         tbe = rev(cumsum(rev(tmp_b)))) %>%
  ungroup() %>%
  select(-c(tmp_a, tmp_b, tmpG))

The new columns include time after event (tae) and time before event (tbe).

The result:

   id time event tae tbe
1   1    0   100   0   0
2   1   12   100   0   0
3   1   24   100   0   0
4   1   36   100   0   0
5   1   48     0  12  48
6   1   60     0  24  36
7   1   72     0  36  24
8   1   84     0  48  12
9   1   96   100   0   0
10  2    0   100   0   0
11  2   12     0  12  24
12  2   24     0  24  12
13  2   36   100   0   0
14  2   48     0  12  48
15  2   60     0  24  36
16  2   72     0  36  24
17  2   84     0  48  12
18  2   96     0  60   0
19  3    0   100   0   0
20  3   12   100   0   0
21  3   24     0  12  24
22  3   36     0  24  12
23  3   48   100   0   0
24  3   60   100   0   0
25  3   72   100   0   0
26  3   84     0  12  12
27  3   96   100   0   0

The result with the second example:

  id time event tae tbe
1  1    0   100   0   0
2  1   10     0  10  23
3  1   22     0  22  11
4  1   33   100   0   0
5  1   45     0  12  12
6  1   57   100   0   0
7  1   66     0   9  26
8  1   79     0  22  13
9  1   92   100   0   0

0人赞添加讨论(0) 举报

我命由我不由天

4楼-- · 2019-02-10 23:41

You were very close with your dplyr implementation. Try this

df %>%
  arrange(id, time) %>%
  group_by(id) %>%
  mutate(tae = cumsum(event==0)*12)

0人赞添加讨论(0) 举报

\"骚年 ilove

5楼-- · 2019-02-10 23:41

I can't think of a way to vectorize it right now, but here's a loop that should be decently quick (O(n)).

event <- (df$event == 100)
lasttime <- 0

time <- df$time
tae <- rep(0, nrow(df))

for(i in 1:nrow(df)){

    if(event[i]){

        lasttime <- time[i]

    }else{

        tae[i] <- time[i] - lasttime

    }

}

df$tae <- tae

0人赞添加讨论(0) 举报

Calculate elapsed time since last event

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间