Group by data intervals

I have a single table which stores bandwidth usage on the network over a period of time. One column will contain the date time (primary key) and another column will record the bandwidth. Data is recorded every minute. We will have other columns recording other data at that moment in time.

If the user requests the data on 15 minute intervals (within a 24 hour period given start and end date), is it possible with a single query to get the data I require or would I have to write a stored procedure/cursor to do this? Users may then request 5 minute intervals data etc.

I will most likely be using Postgres but are there other NOSQL options which would be better?

Any ideas?

标签： sql postgresql nosql aggregate-functions generate-series

2条回答

一纸荒年 Trace。

2楼-- · 2019-01-14 19:48

select
    date_trunc('hour', d) + 
    (((extract(minute from d)::integer / 5 * 5)::text) || ' minute')::interval
    as "from",
    date_trunc('hour', d) + 
    ((((extract(minute from d)::integer / 5 + 1) * 5)::text) || ' minute')::interval
    - '1 second'::interval
    as "to",
    sum(random() * 1000) as bandwidth
from 
    generate_series('2012-01-01', '2012-01-31', '1 minute'::interval) s(d)
group by 1, 2
order by 1, 2
;

That for 5 minutes ranges. For 15 minutes divide by 15.

0人赞添加讨论(0) 举报

相关推荐>>

3楼-- · 2019-01-14 20:04

WITH t AS (
   SELECT ts, (random()*100)::int AS bandwidth
   FROM   generate_series('2012-09-01', '2012-09-04', '1 minute'::interval) ts
   )

SELECT date_trunc('hour', ts) AS hour_stump
      ,(extract(minute FROM ts)::int / 15) AS min15_slot
      ,count(*) AS rows_in_timeslice               -- optional
      ,sum(bandwidth) AS sum_bandwidth
FROM   t
WHERE  ts >= '2012-09-02 00:00:00+02'::timestamptz -- user's time range
AND    ts <  '2012-09-03 00:00:00+02'::timestamptz -- careful with borders 
GROUP  BY 1, 2
ORDER  BY 1, 2;

The CTE t provides data like your table might hold: one timestamp ts per minute with a bandwidth number. (You don't need that part, you work with your table instead.)

Here is a very similar solution for a very similar question - with detailed explanation how this particular aggregation works:

date_trunc 5 minute interval in PostgreSQL

Here is a similar solution for a similar question concerning running sums - with detailed explanation and links for the various functions used:

PostgreSQL: running count of rows for a query 'by minute'

Additional question in comment

WITH -- same as above ...

SELECT DISTINCT ON (1,2)
       date_trunc('hour', ts) AS hour_stump
      ,(extract(minute FROM ts)::int / 15) AS min15_slot
      ,bandwidth AS bandwith_sample_at_min15
FROM   t
WHERE  ts >= '2012-09-02 00:00:00+02'::timestamptz
AND    ts <  '2012-09-03 00:00:00+02'::timestamptz
ORDER  BY 1, 2, ts DESC;

Retrieves one un-aggregated sample per 15 minute interval - from the last available row in the window. This will be the 15th minute if the row is not missing. Crucial parts are DISTINCT ON and ORDER BY.
More information about the used technique here:

Select first row in each GROUP BY group?

0人赞添加讨论(0) 举报

Group by data intervals

Additional question in comment

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间