Detect gaps over 30 min in timestamp column

2019-09-20 04:44发布

问题:

I have read up on and attempted using the standard method of gaps and island detection in a series with no success because I need to be able to ignore any gaps less than 30 minutes. I can not use a cursor due to performance issues.

Everytime there is a gap of at least 30 min, I need a new row with the start and end. If there are no gaps of at least 30, result would be one row with the min and max of the timestamps. If there is 1 gap of at least 30, there would be 2 rows - from the start of the series to the gap and from the gap to the end. If there are more gaps, we get rows for each interval between the gaps, etc.

Input:

timestamp 

2015-07-15 15:01:21 
2015-07-15 15:17:44 
2015-07-15 15:17:53 
2015-07-15 15:18:34 
2015-07-15 15:21:41 
2015-07-15 15:58:12 
2015-07-15 15:59:12 
2015-07-15 16:05:12
2015-07-15 17:02:12

Desired output :

from | to

2015-07-15 15:01:21 | 2015-07-15 15:21:41 
2015-07-15 15:58:12 | 2015-07-15 16:05:12
2015-07-15 17:02:12 | 2015-07-15 17:02:12

回答1:

Easy solution using common table expression. Compare with cursor performance if you have at least 1000 rows.

create table #tmp (Dt datetime)

insert into #tmp values 
('2015-07-15 15:01:21'),
('2015-07-15 15:17:44'), 
('2015-07-15 15:17:53'), 
('2015-07-15 15:18:34'), 
('2015-07-15 15:21:41'), 
('2015-07-15 15:58:12'), 
('2015-07-15 15:59:12'), 
('2015-07-15 16:05:12'),
('2015-07-15 17:02:12')

;with tbl as (
select dt, row_number() over(order by dt) rn
from #tmp
)
select t1.dt [from],t2.dt [to], datediff(minute,t1.dt,t2.dt) gap
from tbl t1 
inner join tbl t2 on t1.rn+1 = t2.rn
where datediff(minute,t1.dt,t2.dt) >30


回答2:

An approach like the one below would seem to fit the bill assuming the performance is okay. I used the sys.all_tables catalog view to simulate a log table like the one in your example. You can vary the first argument to datediff to get different results.

WITH [Sequenced_Entries] AS
(
    SELECT
        row_number() OVER(ORDER BY [modify_date] ASC) AS 'sequence',
        [modify_date] AS 'event_date'
    FROM
        sys.all_objects
)
SELECT
    f.[sequence] AS 'from_event_sequence',
    f.[event_date] AS 'from_event_date',
    t.[sequence] AS 'to_event_sequence',
    t.[event_date] AS 'to_event_date'
FROM
    [Sequenced_Entries] AS f
        INNER JOIN
    [Sequenced_Entries] AS t
        ON (f.[sequence] = t.[sequence] - 1)
WHERE
    datediff(second, f.[event_date], t.[event_date]) < 30
ORDER BY
    f.[sequence] ASC

Don't forget the statement before this one in the batch must be terminated with a ; or this statement must start with one.



回答3:

;with Boundaries as (
    select
        "timestamp" as Stamp,
        coalesce(
            case when datediff(second, prev_timestamp, "timestamp") >= 1800 
                then 1 else 0 end,
            1
        ) as IsBoundary
    from
        T t cross apply (
            select max(t2."timestamp") as prev_timestamp from T t2
            where t2."timestamp" < t."timestamp"
        ) as n
), Blocks as (
    select Stamp, sum(IsBoundary) over (order by Stamp) as BlockNum
    from Boundaries
)
select min(Stamp) as "from", max(Stamp) as "to"
from Blocks
group by BlockNum

Remember that datediff() just counts interval boundaries so you need to take some care with the time difference. I'm using 1800 seconds here.

If you had a later edition of SQL Server you could use lead()/lag() to look for the gaps. But instead of an inner join the cross apply will hopefully be significantly faster.

I think your title "find gaps over 30 minutes" as well as a comment to "ignore gaps less than 30 minutes" has misled some posters who thought that you were only seeking to find the rows that bounded a gap of 30 minutes. This is the only one that addresses the problem according to your desired output. (Test here.)

Instead of using analytic sum() over (order by...) it's a simple matter of replacing it with a scalar subquery.

...
), Blocks as (
    select
      Stamp,
      (select sum(b2.IsBoundary) from Boundaries b2 where b2.Stamp <= b.Stamp) as BlockNum
    from Boundaries b
) ...


回答4:

;with cc as (
  select Dt 
    , (select top 1 Dt from #tmp where Dt > ot.Dt order by Dt) as NextDt
  from #tmp ot
)
select Dt AS [FROM], NextDt AS [TO]
from cc
where DATEDIFF(minute, Dt, NextDt) >= 30
order by Dt

Test

create table #tmp (Dt datetime)

insert into #tmp values 
('2015-07-15 15:01:21'),
('2015-07-15 15:17:44'), 
('2015-07-15 15:17:53'), 
('2015-07-15 15:18:34'), 
('2015-07-15 15:21:41'), 
('2015-07-15 15:58:12'), 
('2015-07-15 15:59:12'), 
('2015-07-15 16:05:12'),
('2015-07-15 17:02:12')

Result

FROM                           TO
2015-07-15 15:21:41.000        2015-07-15 15:58:12.000
2015-07-15 16:05:12.000        2015-07-15 17:02:12.000

If a post answers your question, please Mark it as answer