R: Aggregating History By ID By Date

I have a large data set that has unique IDs for individuals as well as dates, and each individual is capable of multiple encounters.

The below is code and an example of how this data might look:

strDates <- c("09/09/16", "6/7/16", "5/6/16", "2/3/16", "2/1/16", "11/8/16",      
"6/8/16", "5/8/16","2/3/16","1/1/16")
Date<-as.Date(strDates, "%m/%d/%y")
ID <- c("A", "A", "A", "A","A","B","B","B","B","B")
Event <- c(1,0,1,0,1,0,1,1,1,0)
sample_df <- data.frame(Date,ID,Event)

sample_df

         Date ID Event
1  2016-09-09  A     1
2  2016-06-07  A     0
3  2016-05-06  A     1
4  2016-02-03  A     0
5  2016-02-01  A     1
6  2016-11-08  B     0
7  2016-06-08  B     1
8  2016-05-08  B     1
9  2016-02-03  B     1
10 2016-01-01  B     0

I want to keep all attached information per encounter, but then aggregate the following historical information by id

Number of Previous Encounters
Number of Previous Events

As an example, let's look at Row 2.

Row 2 is ID A, so I would reference Rows 3-5 (which occurred prior to Row 2 Encounter). Within this group of rows, we see that Row 3 & 5 both had events.

Number of Previous Encounters for Row 2 = 3

Number of Previous Events for Row 2 = 2

Ideally, I would get the following output:

         Date ID Event PrevEnc PrevEvent
1  2016-09-09  A     1       4         2
2  2016-06-07  A     0       3         2
3  2016-05-06  A     1       2         1
4  2016-02-03  A     0       1         1
5  2016-02-01  A     1       0         0
6  2016-11-08  B     0       4         3
7  2016-06-08  B     1       3         2
8  2016-05-08  B     1       2         1
9  2016-02-03  B     1       1         0
10 2016-01-01  B     0       0         0

So far, I have tried working this problem in dplyr with mutate as well as summarise, both of which have not let me successfully restrict my aggregation to events that occurred previously for a specific ID. I have tried some messy For-loops with If-then statements, but really just wondering if a package or technique exists to simplify this process.

Thank you!

标签： r dplyr aggregate

3条回答

老娘就宠你

2楼-- · 2019-06-13 01:56

Or, if you want to try with data.table, you can use this:

library(data.table)

# Convert to data.table and sort
sample_dt <- as.data.table(sample_df)
sample_dt <- sample_dt[order(Date)]

# Count only the previous Events with 1
sample_dt[, prevEvent := ifelse(Event == 1, cumsum(Event) - 1, cumsum(Event)), by = "ID"]

# .I gives the row number, and .SD contains the Subset of the Data for each group
sample_dt[, prevEnc := .SD[,.I - 1], by = "ID"]

print(sample_dt)
          Date ID Event prevEvent prevEnc
 1: 2016-01-01  B     0         0       0
 2: 2016-02-01  A     1         0       0
 3: 2016-02-03  A     0         1       1
 4: 2016-02-03  B     1         0       1
 5: 2016-05-06  A     1         1       2
 6: 2016-05-08  B     1         1       2
 7: 2016-06-07  A     0         2       3
 8: 2016-06-08  B     1         2       3
 9: 2016-09-09  A     1         2       4
10: 2016-11-08  B     0         3       4

If you didn't know this package, there is an excellent cheat sheet for most of the operations.

0人赞添加讨论(0) 举报

Deceive 欺骗

3楼-- · 2019-06-13 02:00

The biggest impediment is the current sort order. Here, I stored an original index point, which I later used to re-sort the data (then removed it). Other than that, the basic idea is to count up from 0 for the encounters, and to use cumsum to count the events as they happen. To that end, lag is used to avoid counting the current event.

sample_df %>%
  mutate(origIndex = 1:n()) %>%
  group_by(ID) %>%
  arrange(ID, Date) %>%
  mutate(PrevEncounters = 0:(n() -1)
         , PrevEvents = cumsum(lag(Event, default = 0))) %>%
  arrange(origIndex) %>%
  select(-origIndex)

Gives

         Date     ID Event PrevEncounters PrevEvents
       <date> <fctr> <dbl>          <int>      <dbl>
1  2016-09-09      A     1              4          2
2  2016-06-07      A     0              3          2
3  2016-05-06      A     1              2          1
4  2016-02-03      A     0              1          1
5  2016-02-01      A     1              0          0
6  2016-11-08      B     0              4          3
7  2016-06-08      B     1              3          2
8  2016-05-08      B     1              2          1
9  2016-02-03      B     1              1          0
10 2016-01-01      B     0              0          0

0人赞添加讨论(0) 举报

Explosion°爆炸

4楼-- · 2019-06-13 02:11

As @Frank and @MarkPeterson point out, the biggest hurdle here is that the Date column is sorted in descending order. Another approach that does not require resorting the Date column:

library(dplyr)
res <- sample_df %>% group_by(ID) %>% 
                     mutate(PrevEnc=n()-row_number(),
                            PrevEvent=rev(cumsum(lag(rev(Event), default=0))))

Here, we use row_number() to determine the row index and n() to determine the number of rows (grouped by ID). Since the Date is sorted in descending order, the number of previous encounter is simply n()-row_number(). To compute the number of previous events, we again exploit the fact that the Date column is sorted in descending order and use rev to reverse the order of the Event column before cumsum the lag of this reversed column. We then use rev again to reverse the result back to the original order.

Using your data:

print(res)
##Source: local data frame [10 x 5]
##Groups: ID [2]
##
##         Date     ID Event PrevEnc PrevEvent
##       <date> <fctr> <dbl>   <int>     <dbl>
##1  2016-09-09      A     1       4         2
##2  2016-06-07      A     0       3         2
##3  2016-05-06      A     1       2         1
##4  2016-02-03      A     0       1         1
##5  2016-02-01      A     1       0         0
##6  2016-11-08      B     0       4         3
##7  2016-06-08      B     1       3         2
##8  2016-05-08      B     1       2         1
##9  2016-02-03      B     1       1         0
##10 2016-01-01      B     0       0         0

0人赞添加讨论(0) 举报

R: Aggregating History By ID By Date

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间