How to aggregate between two dates in R?

2019-05-25 00:40发布

Below are the two tables

Table1
Date                   OldPrice   NewPrice
2014-06-12 09:32:56       0          10
2014-06-27 16:13:36       10         12
2014-08-12 22:41:47       12         13

Table2
Date                   Qty
2014-06-15 18:09:23     5
2014-06-19 12:04:29     4
2014-06-22 13:21:34     3
2014-06-29 19:01:22     6
2014-07-01 18:02:33     3
2014-09-29 22:41:47     6

I want to display the result in this manner

Date                   OldPrice   NewPrice    Qty
2014-06-12 09:32:56       0          10        0
2014-06-27 16:13:36       10         12        12
2014-08-12 22:41:47       12         13        15

I used the command

for(i in 1:nrow(Table1)){

  startDate = Table1$Date[i]
  endDate = Table1$Date[i+1]


 code=aggregate(list(Table2$Qty),
by=list(Table1$Date, Table1$OldPrice, Table1$NewPrice, Date = Table2$Date > startDate  & Table2$Date <= endDate), FUN=sum)

}

I want the quantity to be aggregated between the given dates in first table, i.e between the first and second dates, second and third dates and so on.

Thanks in advance!

标签: r date aggregate
3条回答
小情绪 Triste *
2楼-- · 2019-05-25 01:10

You started with a for loop hence you could do the following the for loops way:

df1 <- read.table(text=
"'Date'                   'OldPrice'   'NewPrice'
'2014-06-12 09:32:56'     '0'          '10'
'2014-06-27 16:13:36'     '10'         '12'
'2014-08-12 22:41:47'     '12'         '13'", stringsAsFactors=F,header=T)

df2 <- read.table(text=
"'Date'                  'Qty'
'2014-06-15 18:09:23'     '5'
'2014-06-19 12:04:29'     '4'
'2014-06-22 13:21:34'     '3'
'2014-06-29 19:01:22'     '6'
'2014-07-01 18:02:33'     '3'" , stringsAsFactors=F, header=T)

df1 <- df1[with(df1, order(Date)),] #order df1 by Date
df1$Date <- as.POSIXct(df1$Date); df2$Date <- as.POSIXct(df2$Date) #convert into datetime formats
values <- vector("list", length = nrow(df1)) #declare a list of specific length of df1

for(i in 1:nrow(df1)){
  for(j in 1:nrow(df2)){
  if(df2$Date[j]>df1$Date[i] & df2$Date[j]<df1$Date[i+1]){
    values[[i]] <- append(values[[i]], df2$Qty[j])
  }
  }
}

df1$Quantity <- c(0, sapply(values, sum)[1:(nrow(df1)-1)]) #replace the leading quantity value with 0 (as per your example)

#                 Date OldPrice NewPrice Quantity
#1 2014-06-12 09:32:56        0       10        0
#2 2014-06-27 16:13:36       10       12       12
#3 2014-08-12 22:41:47       12       13        9

Obviously, more work, but it could help out if you were stuck on for loops.

查看更多
不美不萌又怎样
3楼-- · 2019-05-25 01:15

A little bit verbose idea with dplyr and tidyr:

library(dplyr)
library(tidyr)

full_join(Table1, Table2, by = "Date") %>% 
  arrange(Date) %>% 
  fill(OldPrice, NewPrice, .direction = "up") %>% 
  group_by(OldPrice, NewPrice) %>% 
  summarize(Qty = sum(Qty, na.rm = TRUE)) %>% 
  ungroup() %>% 
  select(Qty) %>% 
  bind_cols(Table1, .)

#                  Date OldPrice NewPrice Qty
# 1 2014-06-12 09:32:56        0       10   0
# 2 2014-06-27 16:13:36       10       12  12
# 3 2014-08-12 22:41:47       12       13   9
查看更多
劳资没心,怎么记你
4楼-- · 2019-05-25 01:30

We can do a join with data.table

library(data.table)
res <- setDT(df1)[df2, roll = -Inf, on = .(Date)][, .(Qty = sum(Qty)),
           .(OldPrice, NewPrice)][df1, on = .(OldPrice, NewPrice)][is.na(Qty), Qty := 0]
setcolorder(res, c(names(df1), "Qty"))
res
#                   Date OldPrice NewPrice Qty
#1: 2014-06-12 09:32:56        0       10   0
#2: 2014-06-27 16:13:36       10       12  12
#3: 2014-08-12 22:41:47       12       13   9
查看更多
登录 后发表回答