I have a data-set which contains observations for several weeks with 2 minutes frequency. I want to increase the time interval from 2 minute to 5 minute. The problem is that, frequency of the observations are not always the same. I mean, theoretically, every 10 minute there should be 5 observation but usually it is not the case. Please let me know how I can aggregate the observations based on average function and with respect to the time and date of the observations. In other words aggregation based on every 5 minutes while number of observations are not the same for each 5 minute time interval. Moreover, I have date and time in timestamp format.
Example Data:
1 2007-09-14 22:56:12 5.39
2 2007-09-14 22:58:12 5.34
3 2007-09-14 23:00:12 5.16
4 2007-09-14 23:02:12 5.54
5 2007-09-14 23:04:12 5.30
6 2007-09-14 23:06:12 5.20
expected results:
1 2007-09-14 23:00 5.29
2 2007-09-14 23:05 5.34
The answers to this question likely provide good solutions to your problem, showing ways to efficiently aggregate data into time windows.
Essentially, use the
avg
aggregate with:Ok, so this is just one way to handle this. I hope this gets you thinking about how to convert the data for you analysis needs.
There's a prerequisite to test out this code. You need to have a table with all possible 1-minute timestamps. There are many ways to go about this, I'll just use what I have available, which is one table: dim_time which has each minute (00:01:00) through (23:59:00) and another table with all possible dates (dim_date). When you join these (1=1) you get all possible minutes for all possible days.
EDIT: I did a bit more thinking about this and realized that you can't just go from 2-min to 5-min. It doesn't add up. I'll follow-up on that, but the following code does work once you have some 1-min data to aggregate!
--
If the data is in a 'beginning' format you can use the code inside this function, or create the function on your database for ease of access:
You just feed it the integer number of minutes you want to aggregate on (use 1, 2, 3, 4, 5, 6, 10, 12, 15, 20, or 30), here's a couple results:
= '2012-01-01 02:02:00'
= '2012-01-01 02:00:00'
Just test it out and add or subtract time to handle beginning versus ending timestamps using the built-in timestamp functions.
When you get the timestamp you want, do what Craig said, and GROUP BY on that timestamp, in conjunction with your desired aggregate functions (likely averages).
You could test/tweak it with:
It may turn out that you want to average the timestamps - if your interval duration is volatile for example. For this, you could make a similar function that rounds the timestamp instead of taking a floor.
By far the simplest option is to create a reference table. In that table you store the intervals over which you are insterested:
(Adapt this to your own RDBMS's date notation.)
Then you just join and aggregate...
NOTE: You only need to create and populate that intervals table once, then you can re-use it many times.