I have a PostgreSQL 9.1 database with a table containing a timestamp and a measuring value
'2012-10-25 01:00' 2
'2012-10-25 02:00' 5
'2012-10-25 03:00' 12
'2012-10-25 04:00' 7
'2012-10-25 05:00' 1
... ...
I need to average the value over a range of 8 hours, every hour. In other words, I need the average of 1h-8h, 2h-9h, 3h-10h etc.
I have no idea how to proceed for such a query. I have looked everywhere but have also no clue what functionalities to look for.
The closes I find are hourly/daily averages or block-averages (e.g. 1h-8h, 9h-16h etc.). But in these cases, the timestamp is simply converted using the date_trunc()
function (as in the example below), which is not of use to me.
What I think I am looking for is a function similar to this
SELECT date_trunc('day', timestamp), max(value)
FROM table_name
GROUP BY date_trunc('day', timestamp);
But then using some kind of 8-hour range for EVERY hour in the group-by clause. Is that even possible?
A window function with a custom frame makes this amazingly simple:
SELECT ts
,avg(val) OVER (ORDER BY ts
ROWS BETWEEN CURRENT ROW AND 7 FOLLOWING) AS avg_8h
FROM tbl;
Live demo on sqlfiddle.
The frame for each average is the current row plus the following 7. This assumes you have exactly one row for every hour. Your sample data seems to imply that, but you did not specify.
The way it is, avg_8h
for the final (according to ts
) 7 rows of the set is computed with fewer rows, until the value of the last row equals its own average. You did not specify how to deal with the special case.
The key is to make a virtual table against which to join your results sets. The generate_series
function can help do that, in the following manner:
SELECT
start
, start + interval '8 hours' as end
FROM (
SELECT generate_series(
date'2012-01-01'
, date'2012-02-02'
, '1 hour'
) AS start
) x;
This produces output something like this:
start | end
------------------------+------------------------
2012-01-01 00:00:00+00 | 2012-01-01 08:00:00+00
2012-01-01 01:00:00+00 | 2012-01-01 09:00:00+00
2012-01-01 02:00:00+00 | 2012-01-01 10:00:00+00
2012-01-01 03:00:00+00 | 2012-01-01 11:00:00+00
This gives you something to join your data to. In this way, the following query:
SELECT
y.start
, round(avg(ts_val.v))
FROM
ts_val,
(
SELECT
start
, start + interval '8 hours' as end
FROM (
SELECT generate_series(
date'2012-01-01'
, date'2012-02-02'
, '1 hour'
) AS start
) x
) y
WHERE
ts BETWEEN y.start AND y.end
GROUP BY
y.start
ORDER BY
y.start
;
For the following data
ts | v
---------------------+---
2012-01-01 01:00:00 | 2
2012-01-01 09:00:00 | 2
2012-01-01 10:00:00 | 5
(3 rows)
Will produce the following results:
start | round
------------------------+-------
2012-01-01 00:00:00+00 | 2.0
2012-01-01 01:00:00+00 | 2.0
2012-01-01 02:00:00+00 | 3.5
2012-01-01 03:00:00+00 | 3.5
2012-01-01 04:00:00+00 | 3.5
2012-01-01 05:00:00+00 | 3.5
2012-01-01 06:00:00+00 | 3.5
2012-01-01 07:00:00+00 | 3.5
2012-01-01 08:00:00+00 | 3.5
2012-01-01 09:00:00+00 | 3.5
2012-01-01 10:00:00+00 | 5.0
(11 rows)