Group by data intervals

2019-01-14 19:18发布

I have a single table which stores bandwidth usage on the network over a period of time. One column will contain the date time (primary key) and another column will record the bandwidth. Data is recorded every minute. We will have other columns recording other data at that moment in time.

If the user requests the data on 15 minute intervals (within a 24 hour period given start and end date), is it possible with a single query to get the data I require or would I have to write a stored procedure/cursor to do this? Users may then request 5 minute intervals data etc.

I will most likely be using Postgres but are there other NOSQL options which would be better?

Any ideas?

2条回答
一纸荒年 Trace。
2楼-- · 2019-01-14 19:48
select
    date_trunc('hour', d) + 
    (((extract(minute from d)::integer / 5 * 5)::text) || ' minute')::interval
    as "from",
    date_trunc('hour', d) + 
    ((((extract(minute from d)::integer / 5 + 1) * 5)::text) || ' minute')::interval
    - '1 second'::interval
    as "to",
    sum(random() * 1000) as bandwidth
from 
    generate_series('2012-01-01', '2012-01-31', '1 minute'::interval) s(d)
group by 1, 2
order by 1, 2
;

That for 5 minutes ranges. For 15 minutes divide by 15.

查看更多
相关推荐>>
3楼-- · 2019-01-14 20:04
WITH t AS (
   SELECT ts, (random()*100)::int AS bandwidth
   FROM   generate_series('2012-09-01', '2012-09-04', '1 minute'::interval) ts
   )

SELECT date_trunc('hour', ts) AS hour_stump
      ,(extract(minute FROM ts)::int / 15) AS min15_slot
      ,count(*) AS rows_in_timeslice               -- optional
      ,sum(bandwidth) AS sum_bandwidth
FROM   t
WHERE  ts >= '2012-09-02 00:00:00+02'::timestamptz -- user's time range
AND    ts <  '2012-09-03 00:00:00+02'::timestamptz -- careful with borders 
GROUP  BY 1, 2
ORDER  BY 1, 2;

The CTE t provides data like your table might hold: one timestamp ts per minute with a bandwidth number. (You don't need that part, you work with your table instead.)

Here is a very similar solution for a very similar question - with detailed explanation how this particular aggregation works:

Here is a similar solution for a similar question concerning running sums - with detailed explanation and links for the various functions used:

Additional question in comment

WITH -- same as above ...

SELECT DISTINCT ON (1,2)
       date_trunc('hour', ts) AS hour_stump
      ,(extract(minute FROM ts)::int / 15) AS min15_slot
      ,bandwidth AS bandwith_sample_at_min15
FROM   t
WHERE  ts >= '2012-09-02 00:00:00+02'::timestamptz
AND    ts <  '2012-09-03 00:00:00+02'::timestamptz
ORDER  BY 1, 2, ts DESC;

Retrieves one un-aggregated sample per 15 minute interval - from the last available row in the window. This will be the 15th minute if the row is not missing. Crucial parts are DISTINCT ON and ORDER BY.
More information about the used technique here:

查看更多
登录 后发表回答