My dataframe has column-names Timestamp, es
and looks like:
Timestamp es
2015-04-01 09:07:42 31
2015-04-01 09:08:01 29.5
2015-04-01 09:15:03 18.5
2015-04-01 09:15:05 8.8
2015-04-01 09:15:09 9.6
The time runs till 15:30:30 (around 12000 es
data points against each timestamp a day) and the corresponding es.
Does R have some function in some package or code to average the es
of all the timestamps within half hour. Sample output should look like:
2015-04-01 09:30:00 Value(Average of all es from 9:00 to 9:30)
2015-04-01 10:00:00 Value(Average of all es from 9:30 to 10:00)
2015-04-01 10:30:00 Value(Average of all es from 10:00 to 10:30)
... (the list goes on till 15:30:30)
There are 48 such fixed half-hour windows on each date. (Actually your windows are fixed, not even sliding.) Looks like a trading dataset, so you only want time-windows from 09:00-09:30 to 15:30-16:00(?), hence only 14 windows on each date.
You just create those datetime windows with lubridate::interval
, then do a split-apply-combine (aggregation) using either dplyr/data.table
to get the mean within each window.
If you post a reproducible example I'll post code.
One point:
- Take care that if a window is empty, whether you want an average of 0, not NA, or else that those (market-closed) windows should be omitted from the result. You also want to exclude weekends and market holidays. If your data is sparse you'll have to construct those dates yourself.
As smci warned, the less information you add about your data, the more limited the help can be. This is a base R
approach that creates 30 minute intervals from the dates provided. Therefore, empty intervals will not appear (depending on your desired output this can help or not). The aggregate function applies the mean by the desired group intervals. I expanded your example to incorporate more test intervals:
cuts <- seq(round(min(df$Timestamp), "hours"), max(df$Timestamp)+30*60, "30 min")
aggregate(df$es, list(cut(df$Timestamp, cuts)), mean)
# Group.1 x
#1 2015-04-01 09:00:00 31.00
#2 2015-04-01 10:00:00 29.50
#3 2015-04-01 11:00:00 13.65
#4 2015-04-01 13:00:00 9.60
Data
df <- structure(list(Timestamp = structure(c(1427893662, 1427897281,
1427901303, 1427901605, 1427908509), class = c("POSIXct", "POSIXt"
), tzone = ""), es = c(31, 29.5, 18.5, 8.8, 9.6)), .Names = c("Timestamp",
"es"), row.names = c(NA, -5L), class = "data.frame")