I have a single table which stores bandwidth usage on the network over a period of time. One column will contain the date time (primary key) and another column will record the bandwidth. Data is recorded every minute. We will have other columns recording other data at that moment in time.
If the user requests the data on 15 minute intervals (within a 24 hour period given start and end date), is it possible with a single query to get the data I require or would I have to write a stored procedure/cursor to do this? Users may then request 5 minute intervals data etc.
I will most likely be using Postgres but are there other NOSQL options which would be better?
Any ideas?
WITH t AS (
SELECT ts, (random()*100)::int AS bandwidth
FROM generate_series('2012-09-01', '2012-09-04', '1 minute'::interval) ts
)
SELECT date_trunc('hour', ts) AS hour_stump
,(extract(minute FROM ts)::int / 15) AS min15_slot
,count(*) AS rows_in_timeslice -- optional
,sum(bandwidth) AS sum_bandwidth
FROM t
WHERE ts >= '2012-09-02 00:00:00+02'::timestamptz -- user's time range
AND ts < '2012-09-03 00:00:00+02'::timestamptz -- careful with borders
GROUP BY 1, 2
ORDER BY 1, 2;
The CTE t
provides data like your table might hold: one timestamp ts
per minute with a bandwidth
number. (You don't need that part, you work with your table instead.)
Here is a very similar solution for a very similar question - with detailed explanation how this particular aggregation works:
- date_trunc 5 minute interval in PostgreSQL
Here is a similar solution for a similar question concerning running sums - with detailed explanation and links for the various functions used:
- PostgreSQL: running count of rows for a query 'by minute'
Additional question in comment
WITH -- same as above ...
SELECT DISTINCT ON (1,2)
date_trunc('hour', ts) AS hour_stump
,(extract(minute FROM ts)::int / 15) AS min15_slot
,bandwidth AS bandwith_sample_at_min15
FROM t
WHERE ts >= '2012-09-02 00:00:00+02'::timestamptz
AND ts < '2012-09-03 00:00:00+02'::timestamptz
ORDER BY 1, 2, ts DESC;
Retrieves one un-aggregated sample per 15 minute interval - from the last available row in the window. This will be the 15th minute if the row is not missing. Crucial parts are DISTINCT ON
and ORDER BY
.
More information about the used technique here:
- Select first row in each GROUP BY group?
select
date_trunc('hour', d) +
(((extract(minute from d)::integer / 5 * 5)::text) || ' minute')::interval
as "from",
date_trunc('hour', d) +
((((extract(minute from d)::integer / 5 + 1) * 5)::text) || ' minute')::interval
- '1 second'::interval
as "to",
sum(random() * 1000) as bandwidth
from
generate_series('2012-01-01', '2012-01-31', '1 minute'::interval) s(d)
group by 1, 2
order by 1, 2
;
That for 5 minutes ranges. For 15 minutes divide by 15.