Group table into 15 minute intervals

2020-02-07 02:58发布

T-SQL, SQL Server 2008 and up

Given a sample table of

 StatusSetDateTime   | UserID | Status    | StatusEndDateTime   | StatusDuration(in seconds)
============================================================================
 2012-01-01 12:00:00 | myID   | Available | 2012-01-01 13:00:00 | 3600

I need to break that down into a view that uses 15 minute intervals for example:

IntervalStart       | UserID | Status | Duration

===========================================

2012-01-01 12:00:00 | myID | Available | 900 

2012-01-01 12:15:00 | myID | Available | 900

2012-01-01 12:30:00 | myID | Available | 900 

2012-01-01 12:45:00 | myID | Available | 900 

2012-01-01 13:00:00 | myID | Available | 0

etc....

Now I've been able to search around and find some queries that will break down I found something similar for MySql Here :

And something for T-SQL Here

But on the second example they are summing the results whereas I need to divide the total duration by the interval time (900 seconds) by user by status.

I was able to adapt the examples in the second link to split everything into intervals but the total duration time is returned and I cannot quite figure out how to get the Interval durations to split (and still sum up to the total original duration).

Thanks in advance for any insight!

edit : First Attempt

 ;with cte as 
    (select MIN(StatusDateTime) as MinDate
          , MAX(StatusDateTime) as MaxDate
          , convert(varchar(14),StatusDateTime, 120) as StartDate
          , DATEPART(minute, StatusDateTime) /15 as GroupID
          , UserID
          , StatusKey
          , avg(StateDuration) as AvgAmount
     from AgentActivityLog
     group by convert(varchar(14),StatusDateTime, 120)
         , DATEPART(minute, StatusDateTime) /15
         , Userid,StatusKey)

  select dateadd(minute, 15*GroupID, CONVERT(datetime,StartDate+'00'))
         as [Start Date]
       , UserID, StatusKey, AvgAmount as [Average Amount]
  from cte

edit : Second Attempt

;With cte As
   (Select DateAdd(minute
                   , 15 * (DateDiff(minute, '20000101', StatusDateTime) / 15)
                   , '20000101') As StatusDateTime
         , userid, statuskey, StateDuration
    From AgentActivityLog)

 Select StatusDateTime, userid,statuskey,Avg(StateDuration)
 From cte
 Group By StatusDateTime,userid,statuskey;

5条回答
劫难
2楼-- · 2020-02-07 03:33

It is relatively simple if you have a helper table with every 15-minute timestamp, which you JOIN to your base table via BETWEEN. You can build the helper table on the fly or keep it permanently in your database. Simple for the next guy at your company to figure out too:

// declare a table and a timestamp variable
declare @timetbl table(t datetime)
declare @t datetime

// set the first timestamp
set @t = '2012-01-01 00:00:00'

// set the last timestamp, can easily be extended to cover many years
while @t <= '2013-01-01'
begin
    // populate the table with a new row, every 15 minutes
    insert into @timetbl values (@t)
    set @t = dateadd(mi, 15, @t)
end


// now the Select query:
select 
   tt.t, aal.UserID, aal.Status,
   case when aal.StatusEndDateTime <= tt.t then 0 else 900 end as Duration
   // using a shortcut for Duration, based on your comment that Start/End are always on the quarter-hour, and thus always 900 seconds or zero

from 
   @timetbl tt 
      INNER JOIN AgentActivityLog aal 
         on tt.t between aal.StatusSetDateTime and aal.StatusEndDateTime

order by
  aal.UserID, tt.t
查看更多
一夜七次
3楼-- · 2020-02-07 03:34
;with cte_max as 
(
   select dateadd(mi, -15, max(StatusEndDateTime)) as EndTime, min(StatusSetDateTime) as StartTime
   from AgentActivityLog
), times as
(
    select StartTime as Time from cte_max
    union all
    select dateadd(mi, 15, c.Time)
    from times as c
        cross join cte_max as cm
    where c.Time <= cm.EndTime
)
select
    t.Time, A.UserID, A.Status,
    case
        when t.Time = A.StatusEndDateTime then 0
        else A.StatusDuration / (count(*) over (partition by A.StatusSetDateTime, A.UserID, A.Status) - 1)
    end as Duration
from AgentActivityLog as A
    left outer join times as t on t.Time >= A.StatusSetDateTime and t.Time <= A.StatusEndDateTime

sql fiddle demo

查看更多
孤傲高冷的网名
4楼-- · 2020-02-07 03:42

I've never been comfortable with using date math to split things up into partitions. It seems like there are all kinds of pitfalls to fall into.

What I prefer to do is to create a table (pre-defined, table-valued function, table variable) where there's one row for each date partition range. The table-valued function approach is particularly useful because you can build it for arbitrary ranges and partition sizes as you need. Then, you can join to this table to split things out.

paritionid starttime     endtime
---------- ------------- -------------
1          8/1/2012 5:00 8/1/2012 5:15
2          8/1/2012 5:15 8/1/2012 5:30
...

I can't speak to the performance of this method, but I find the queries are much more intuitive.

查看更多
Melony?
5楼-- · 2020-02-07 03:44

Here's a query that will do the job for you without requiring helper tables. (I have nothing against helper tables, they are useful and I use them. It is also possible to not use them sometimes.) This query allows for activities to start and end at any times, even if not whole minutes ending in :00, :15, :30, :45. If there will be millisecond portions then you'll have to do some experimenting because, following your model, I only went to second resolution.

If you have a known hard maximum duration, then remove @MaxDuration and replace it with that value, in minutes. N <= @MaxDuration is crucial to the query performing well.

DECLARE @MaxDuration int;
SET @MaxDuration = (SELECT Max(StatusDuration) / 60 FROM #AgentActivityLog);

WITH
L0 AS(SELECT 1 c UNION ALL SELECT 1),
L1 AS(SELECT 1 c FROM L0, L0 B),
L2 AS(SELECT 1 c FROM L1, L1 B),
L3 AS(SELECT 1 c FROM L2, L2 B),
L4 AS(SELECT 1 c FROM L3, L3 B),
L5 AS(SELECT 1 c FROM L4, L4 B),
Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) n FROM L5)
SELECT
   S.IntervalStart,
   Duration = DateDiff(second, S.IntervalStart, E.IntervalEnd)
FROM
   #AgentActivityLog L
   CROSS APPLY (
      SELECT N, Offset = (N.N - 1) * 900
      FROM Nums N
      WHERE N <= @MaxDuration
   ) N
   CROSS APPLY (
      SELECT Edge =
         DateAdd(second, N.Offset, DateAdd(minute,
            DateDiff(minute, '20000101', L.StatusSetDateTime)
            / 15 * 15, '20000101')
         )
   ) G
   CROSS APPLY (
      SELECT IntervalStart = Max(T.BeginTime)
      FROM (
         SELECT L.StatusSetDateTime
         UNION ALL SELECT G.Edge
      ) T (BeginTime)
   ) S
   CROSS APPLY (
      SELECT IntervalEnd = Min(T.EndTime)
      FROM (
         SELECT L.StatusEndDateTime
         UNION ALL SELECT G.Edge + '00:15:00'
      ) T (EndTime)
   ) E
WHERE
   N.Offset <= L.StatusDuration
ORDER BY
   L.StatusSetDateTime,
   S.IntervalStart;

Here is setup script if you want to try it:

CREATE TABLE #AgentActivityLog (
    StatusSetDateTime datetime,
    StatusEndDateTime datetime,
    StatusDuration AS (DateDiff(second, 0, StatusEndDateTime - StatusSetDateTime))
);

INSERT #AgentActivityLog -- weird end times
SELECT '20120101 12:00:00', '20120101 13:00:00'
UNION ALL SELECT '20120101 13:00:00', '20120101 13:27:56'
UNION ALL SELECT '20120101 13:27:56', '20120101 13:28:52'
UNION ALL SELECT '20120101 13:28:52', '20120120 11:00:00'

INSERT #AgentActivityLog -- 15-minute quantized end times
SELECT '20120101 12:00:00', '20120101 13:00:00'
UNION ALL SELECT '20120101 13:00:00', '20120101 13:30:00'
UNION ALL SELECT '20120101 13:30:00', '20120101 14:00:00'
UNION ALL SELECT '20120101 14:00:00', '20120120 11:00:00'

Also, here's a version that expects ONLY times that have whole minutes ending in :00, :15, :30, or :45.

DECLARE @MaxDuration int;
SET @MaxDuration = (SELECT Max(StatusDuration) / 60 FROM #AgentActivityLog);

WITH
L0 AS(SELECT 1 c UNION ALL SELECT 1),
L1 AS(SELECT 1 c FROM L0, L0 B),
L2 AS(SELECT 1 c FROM L1, L1 B),
L3 AS(SELECT 1 c FROM L2, L2 B),
L4 AS(SELECT 1 c FROM L3, L3 B),
L5 AS(SELECT 1 c FROM L4, L4 B),
Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 0)) n FROM L5)
SELECT
   S.IntervalStart,
   Duration = CASE WHEN Offset = StatusDuration THEN 0 ELSE 900 END
FROM
   #AgentActivityLog L
   CROSS APPLY (
      SELECT N, Offset = (N.N - 1) * 900
      FROM Nums N
      WHERE N <= @MaxDuration
   ) N
   CROSS APPLY (
      SELECT IntervalStart = DateAdd(second, N.Offset, L.StatusSetDateTime)
   ) S
WHERE
   N.Offset <= L.StatusDuration   
ORDER BY
   L.StatusSetDateTime,
   S.IntervalStart;

It really seems like having the final 0 Duration row is not correct, because then you can't just order by IntervalStart as there are duplicate IntervalStart values. What is the benefit of having rows that add 0 to the total?

查看更多
男人必须洒脱
6楼-- · 2020-02-07 03:54

You can use a recursive Common Table Expression, where you keep adding your duration while the StatusEndDateTime is greater than the IntervalStart e.g.

;with cte as (
    select StatusSetDateTime as IntervalStart
        ,UserID
        ,Status
        ,StatusDuration/(datediff(mi, StatusSetDateTime, StatusEndDateTime)/15) as Duration
        , StatusEndDateTime
    From AgentActivityLog
    Union all
    Select DATEADD(ss, Duration, IntervalStart) as IntervalStart
        , UserID
        , Status
        , case when DATEADD(ss, Duration, IntervalStart) = StatusEndDateTime then 0 else Duration end as Duration
        , StatusEndDateTime
    From cte
    Where IntervalStart < StatusEndDateTime
)

select IntervalStart, UserID, Status, Duration from cte
查看更多
登录 后发表回答