Aggregate function over a given time interval

2019-02-07 02:24发布

问题:

My SQL is a bit rusty and I'm having quite a bit of difficulty with this problem. Suppose I have a table with a Timestamp column and a Number column. The goal is to return a result set containing the average value for some arbitrarily chosen regular interval.

So, for example, if I had the following initial data, the resulting output with a 5 minute interval would be as follows:

time                               value
-------------------------------    -----
06-JUN-12 12.40.00.000000000 PM      2
06-JUN-12 12.41.35.000000000 PM      3
06-JUN-12 12.43.22.000000000 PM      4
06-JUN-12 12.47.55.000000000 PM      5
06-JUN-12 12.52.00.000000000 PM      2
06-JUN-12 12.54.59.000000000 PM      3
06-JUN-12 12.56.01.000000000 PM      4

OUTPUT:

start_time                         avg_value
-------------------------------    ---------
06-JUN-12 12.40.00.000000000 PM      3
06-JUN-12 12.45.00.000000000 PM      5
06-JUN-12 12.50.00.000000000 PM      2.5
06-JUN-12 12.55.00.000000000 PM      4

Note that this is an Oracle database, so Oracle-specific solutions would work fine. This could, of course, be done with a stored procedure but I was hoping to accomplish the task in a single query.

回答1:

CREATE TABLE tt (time TIMESTAMP, value NUMBER);

INSERT INTO tt (time, value) VALUES ('06-JUN-12 12.40.00.000000000 PM', 2);
INSERT INTO tt (time, value) VALUES ('06-JUN-12 12.41.35.000000000 PM', 3);
INSERT INTO tt (time, value) VALUES ('06-JUN-12 12.43.22.000000000 PM', 4);
INSERT INTO tt (time, value) VALUES ('06-JUN-12 12.47.55.000000000 PM', 5);
INSERT INTO tt (time, value) VALUES ('06-JUN-12 12.52.00.000000000 PM', 2);
INSERT INTO tt (time, value) VALUES ('06-JUN-12 12.54.59.000000000 PM', 3);
INSERT INTO tt (time, value) VALUES ('06-JUN-12 12.56.01.000000000 PM', 4);


WITH tmin AS (
    SELECT MIN(time) t FROM tt
),   tmax AS (
    SELECT MAX(time) t FROM tt
)
SELECT ranges.inf, ranges.sup, AVG(tt.value)
FROM
     (
        SELECT 
            5*(level-1)*(1/24/60) + tmin.t as inf,
            5*(level)*(1/24/60) + tmin.t as sup
        FROM tmin, tmax
        CONNECT BY (5*(level-1)*(1/24/60) + tmin.t) < tmax.t
    ) ranges JOIN tt ON tt.time BETWEEN ranges.inf AND ranges.sup
GROUP BY ranges.inf, ranges.sup
ORDER BY ranges.inf

fiddle: http://sqlfiddle.com/#!4/9e314/11

edit: beated by Justin, as usual... :-)



回答2:

Something like

with st 
  as (SELECT to_timestamp( '2012-06-06 12:40:00', 'yyyy-mm-dd hh24:mi:ss') + 
               numtodsinterval((level-1)*5, 'MINUTE') start_time,
             to_timestamp( '2012-06-06 12:40:00', 'yyyy-mm-dd hh24:mi:ss') + 
               numtodsinterval(level*5, 'MINUTE') end_time
        from dual
     connect by level <= 10)
SELECT st.start_time, avg( yt.value )
  FROM your_table yt,
       st
 WHERE yt.time between st.start_time and st.end_time

should work. Rather than generating 10 intervals and hard-coding the lowest interval, you can enhance the query to derive the starting point and the number of rows from the MIN(time) and MAX(time) in the table.



回答3:

Justin's and Sebas' answers can be extended with a LEFT JOIN to eliminate "gaps", which is often desirable.

If that's not necessary, as an alternative, we can go old school Oracle DATE arithmetic...

SELECT TRUNC(t.time)+FLOOR(TO_CHAR(t.time,'sssss')/300)*300/86400 AS time
     , AVG(t.value) AS avg_value
  FROM foo t
 WHERE t.time IS NOT NULL
 GROUP BY TRUNC(t.time)+FLOOR(TO_CHAR(t.time,'sssss')/300)*300/86400
 ORDER BY TRUNC(t.time)+FLOOR(TO_CHAR(t.time,'sssss')/300)*300/86400

Let's unpack that a bit. We can separate the date and time components, using TRUNC to get the date portion, and using a TO_CHAR to return the number of seconds since midnight. We know 5 minutes is 300 seconds, and we know there are 86400 seconds in a day. So we can divide the number of seconds by 300, and take the FLOOR of that (just the integer portion), which rounds us down to the nearest 5 minute boundary. We multiply that back (by 300), to get seconds again, and then divide that by the number of seconds in a day (86400), and we can add that back to the (truncated) date portion.

Painful, yes. But blazingly fast.

NOTE: this returns the rounded time value as a DATE, this could be cast back to a timestamp if needed, but for even 5 minute boundaries, a DATE has sufficient resolution.

As a benefit of this approach, for a large table, we can boost performance of the query by adding a covering index for this query:

CREATE INDEX foo_FBX1
ON foo (TRUNC(t.time)+FLOOR(TO_CHAR(t.time,'sssss')/300)*300/86400,value);

ADDENDUM:

MiMo provided an answer for SQL Server, suggesting that it would be adaptable for Oracle. Here is an adaptation of that approach in Oracle. Note that Oracle does not provide equivalents for the DATEDIFF and DATEADD functions. Oracle uses simple arithmetic instead.

SELECT TO_DATE('00010101','YYYYMMDD')+FLOOR((t.time-TO_DATE('00010101','YYYYMMDD'))*288)/288
       AS time
     , AVG(t.value) AS avg_value
  FROM foo t
 WHERE t.time IS NOT NULL
 GROUP BY TO_DATE('00010101','YYYYMMDD')+FLOOR((t.time-TO_DATE('00010101','YYYYMMDD'))*288)/288
 ORDER BY TO_DATE('00010101','YYYYMMDD')+FLOOR((t.time-TO_DATE('00010101','YYYYMMDD'))*288)/288

The choice of Jan 1, 0001 A.D. as a base date is arbitrary, but I didn't want to mess with negative values, and figuring out if FLOOR would be right, or whether we would need to use CEIL with negative numbers. (The magic number 288 is a result of 1440 minutes in a day divided by 5). In this case, we are taking the fractional day, multiplying by 1440 and dividing by 5, and taking integer portion of that, and then putting it back to fractional days.

It is tempting to pull that "base date" from a PL/SQL package, or get it from a subquery, but doing either of those might prevent this expression from being deterministic. And we'd really like to keep open the option of creating a function based index.

My preference is to avoid the need for including a "base date" in the calculation.



回答4:

This is a solution for SQL Server:

declare @startDate datetime = '2000-01-01T00:00:00'

declare @interval int = 5

select 
  DATEADD(mi, (DATEDIFF(mi, @startDate, time)/@interval)*@interval, @startDate), 
  AVG(value)
from 
  table
group by
  DATEDIFF(mi, @startDate, time)/@interval
order by   
  DATEDIFF(mi, @startDate, time)/@interval

The start date is arbitrary. The idea is that you compute the number of minutes from the start date, and then group by this number divided by the interval.

It should be adaptable to Oracle easily using the equivalent for DATEADD and DATEDIFF