Average of field in half-an-hour window of timesta

2019-08-30 04:12发布

问题:

My dataframe has column-names Timestamp, es and looks like:

          Timestamp     es
2015-04-01 09:07:42     31
2015-04-01 09:08:01   29.5
2015-04-01 09:15:03   18.5
2015-04-01 09:15:05    8.8
2015-04-01 09:15:09    9.6

The time runs till 15:30:30 (around 12000 es data points against each timestamp a day) and the corresponding es.

Does R have some function in some package or code to average the es of all the timestamps within half hour. Sample output should look like:

2015-04-01 09:30:00 Value(Average of all es from 9:00 to 9:30)
2015-04-01 10:00:00 Value(Average of all es from 9:30 to 10:00)
2015-04-01 10:30:00 Value(Average of all es from 10:00 to 10:30)
... (the list goes on till 15:30:30)

回答1:

There are 48 such fixed half-hour windows on each date. (Actually your windows are fixed, not even sliding.) Looks like a trading dataset, so you only want time-windows from 09:00-09:30 to 15:30-16:00(?), hence only 14 windows on each date.

You just create those datetime windows with lubridate::interval, then do a split-apply-combine (aggregation) using either dplyr/data.table to get the mean within each window.

If you post a reproducible example I'll post code.

One point:

  • Take care that if a window is empty, whether you want an average of 0, not NA, or else that those (market-closed) windows should be omitted from the result. You also want to exclude weekends and market holidays. If your data is sparse you'll have to construct those dates yourself.


回答2:

As smci warned, the less information you add about your data, the more limited the help can be. This is a base R approach that creates 30 minute intervals from the dates provided. Therefore, empty intervals will not appear (depending on your desired output this can help or not). The aggregate function applies the mean by the desired group intervals. I expanded your example to incorporate more test intervals:

cuts <- seq(round(min(df$Timestamp), "hours"), max(df$Timestamp)+30*60, "30 min")
aggregate(df$es, list(cut(df$Timestamp, cuts)), mean)
#              Group.1     x
#1 2015-04-01 09:00:00 31.00
#2 2015-04-01 10:00:00 29.50
#3 2015-04-01 11:00:00 13.65
#4 2015-04-01 13:00:00  9.60

Data

df <- structure(list(Timestamp = structure(c(1427893662, 1427897281, 
1427901303, 1427901605, 1427908509), class = c("POSIXct", "POSIXt"
), tzone = ""), es = c(31, 29.5, 18.5, 8.8, 9.6)), .Names = c("Timestamp", 
"es"), row.names = c(NA, -5L), class = "data.frame")