Aggregate values over a range of hours, every hour

2019-03-25 02:06发布

问题:

I have a PostgreSQL 9.1 database with a table containing a timestamp and a measuring value

'2012-10-25 01:00'   2
'2012-10-25 02:00'   5
'2012-10-25 03:00'   12
'2012-10-25 04:00'   7
'2012-10-25 05:00'   1
...                  ...

I need to average the value over a range of 8 hours, every hour. In other words, I need the average of 1h-8h, 2h-9h, 3h-10h etc.

I have no idea how to proceed for such a query. I have looked everywhere but have also no clue what functionalities to look for.

The closes I find are hourly/daily averages or block-averages (e.g. 1h-8h, 9h-16h etc.). But in these cases, the timestamp is simply converted using the date_trunc() function (as in the example below), which is not of use to me.

What I think I am looking for is a function similar to this

SELECT    date_trunc('day', timestamp), max(value) 
FROM      table_name
GROUP BY  date_trunc('day', timestamp);

But then using some kind of 8-hour range for EVERY hour in the group-by clause. Is that even possible?

回答1:

A window function with a custom frame makes this amazingly simple:

SELECT ts
      ,avg(val) OVER (ORDER BY ts
                      ROWS BETWEEN CURRENT ROW AND 7 FOLLOWING) AS avg_8h
FROM tbl;

Live demo on sqlfiddle.

The frame for each average is the current row plus the following 7. This assumes you have exactly one row for every hour. Your sample data seems to imply that, but you did not specify.

The way it is, avg_8h for the final (according to ts) 7 rows of the set is computed with fewer rows, until the value of the last row equals its own average. You did not specify how to deal with the special case.



回答2:

The key is to make a virtual table against which to join your results sets. The generate_series function can help do that, in the following manner:

SELECT
    start
    , start + interval '8 hours' as end
FROM (
    SELECT generate_series(
        date'2012-01-01'
        , date'2012-02-02'
        , '1 hour'
    ) AS start
) x;

This produces output something like this:

         start          |          end           
------------------------+------------------------
 2012-01-01 00:00:00+00 | 2012-01-01 08:00:00+00
 2012-01-01 01:00:00+00 | 2012-01-01 09:00:00+00
 2012-01-01 02:00:00+00 | 2012-01-01 10:00:00+00
 2012-01-01 03:00:00+00 | 2012-01-01 11:00:00+00

This gives you something to join your data to. In this way, the following query:

SELECT
    y.start
    , round(avg(ts_val.v))
FROM
    ts_val,
    (
        SELECT
            start
            , start + interval '8 hours' as end
        FROM (
            SELECT generate_series(
                date'2012-01-01'
                , date'2012-02-02'
                , '1 hour'
            ) AS start
        ) x
    ) y
WHERE
    ts BETWEEN y.start AND y.end
GROUP BY
    y.start
ORDER BY
    y.start
;

For the following data

         ts          | v 
---------------------+---
 2012-01-01 01:00:00 | 2
 2012-01-01 09:00:00 | 2
 2012-01-01 10:00:00 | 5
(3 rows)

Will produce the following results:

         start          | round 
------------------------+-------
 2012-01-01 00:00:00+00 |   2.0
 2012-01-01 01:00:00+00 |   2.0
 2012-01-01 02:00:00+00 |   3.5
 2012-01-01 03:00:00+00 |   3.5
 2012-01-01 04:00:00+00 |   3.5
 2012-01-01 05:00:00+00 |   3.5
 2012-01-01 06:00:00+00 |   3.5
 2012-01-01 07:00:00+00 |   3.5
 2012-01-01 08:00:00+00 |   3.5
 2012-01-01 09:00:00+00 |   3.5
 2012-01-01 10:00:00+00 |   5.0
(11 rows)