How to aggregate between the two dates in R?

2019-06-11 23:12发布

Below are the two tables

Table1
Date                   OldPrice   NewPrice
2014-06-12 09:32:56       0          10
2014-06-27 16:13:36       10         12
2014-08-12 22:41:47       12         13

Table2
Date                   Qty
2014-06-15 18:09:23     5
2014-06-19 12:04:29     4
2014-06-22 13:21:34     3
2014-06-29 19:01:22     6
2014-07-01 18:02:33     3
2014-09-29 22:41:47     6

I want to display the result in this manner

Date                   OldPrice   NewPrice    Qty
2014-06-12 09:32:56       0          10        0
2014-06-27 16:13:36       10         12        12
2014-08-12 22:41:47       12         13        15

I used the command

for(i in 1:nrow(Table1)){

  startDate = Table1$Date[i]
  endDate = Table1$Date[i+1]


 code=aggregate(list(Table2$Qty),
by=list(Table1$Date, Table1$OldPrice, Table1$NewPrice, Date = Table2$Date > startDate  & Table2$Date <= endDate), FUN=sum)

}

I want the quantity to be aggregated between the given dates in first table, i.e between the first and second dates, second and third dates and so on. Also the last date needs to be aggregated till the end of date in the other table.

Thanks in advance!

标签: r date aggregate
1条回答
我命由我不由天
2楼-- · 2019-06-11 23:58

I see what you mean in this question as compared to your previous one

This gives you the exact output you provide:

#                 Date OldPrice NewPrice Quantity
#1 2014-06-12 09:32:56        0       10        0
#2 2014-06-27 16:13:36       10       12       12
#3 2014-08-12 22:41:47       12       13       15

produced by the following code (see the explanation):

#your data & preps
df1 <- read.table(text=
                    "'Date'                   'OldPrice'   'NewPrice'
                  '2014-06-12 09:32:56'     '0'          '10'
                  '2014-06-27 16:13:36'     '10'         '12'
                  '2014-08-12 22:41:47'     '12'         '13'", stringsAsFactors=F,header=T)

df2 <- read.table(text=
                    "'Date'                  'Qty'
                  '2014-06-15 18:09:23'     '5'
                  '2014-06-19 12:04:29'     '4'
                  '2014-06-22 13:21:34'     '3'
                  '2014-06-29 19:01:22'     '6'
                  '2014-07-01 18:02:33'     '3'
                  '2014-09-29 22:41:47'     '6'" , stringsAsFactors=F, header=T)

df1$Date <- as.POSIXct(df1$Date); df2$Date <- as.POSIXct(df2$Date) #convert into datetime formats
df1 <- df1[with(df1, order(Date)),] #order df1 by Date
values <- vector("list", length = nrow(df1)+1) #declare a list of specific length of df1+1
out_of_time_dates_before <- c(); out_of_time_dates_after <- c() #here will be dates that come before or after dates available in df1
names(values) <- c(1:(length(values)-2), "out_of_time_dates_before", "out_of_time_dates_after")

#producing the main outputs
for(j in 1:nrow(df2)){
  print(paste0("Being processed: ", df2$Date[j]))
  for(i in 1:(nrow(df1)-1)){
    if(df2$Date[j]>df1$Date[i] & df2$Date[j]<df1$Date[i+1]){
      values[[i]] <- append(values[[i]], df2$Qty[j])
    } 
  } 
  if(df2$Date[j]<min(df1$Date)){
    out_of_time_dates_before <- append(out_of_time_dates_before, df2$Qty[j])
    values[["out_of_time_dates_before"]] <- append(values[["out_of_time_dates_before"]], df2$Qty[j])
  } else if(df2$Date[j] > max(df1$Date)){
    out_of_time_dates_after <- append(out_of_time_dates_after, df2$Qty[j])
    values[["out_of_time_dates_after"]] <- append(values[["out_of_time_dates_after"]], df2$Qty[j])
  }
}

#aggregating the quantity for the date ranges and all that falls before or after the date ranges not available in df1   
df1$Quantity <- c(0, sapply(values, sum)[1:(nrow(df1)-1)]) #replace the leading quantity value with 0 (as per your example)
df1$Quantity[1] <- df1$Quantity[1]+sapply(values, sum)["out_of_time_dates_before"]
df1$Quantity[length(df1$Quantity)] <- df1$Quantity[length(df1$Quantity)]+sapply(values, sum)["out_of_time_dates_after"]

I think you got interesting problems to solve just a bit unfortunate way of communicating it to the SO community. For instance, it was not clear what to do with the dates in df2 that come before any available range in df1, hence in the code above, I add these quantities (if there are any) to the very first date. This is more generic and analogous to what you expected from the dates in df2 that come after ranges of dates in df1 (add them to the last date of df1).

查看更多
登录 后发表回答