Combine continuous datetime intervals by type

2019-04-29 14:16发布

站内文章 / SQL-Server

34 0

我欲成王，谁敢阻挡

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Say we have such a table:

declare @periods table (
    s date, 
    e date,
    t tinyint
);

with date intervals without gaps ordered by start date (s)

insert into @periods values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

All date intervals have different types (t).

It is required to combine date intervals of the same type where they are not broken by intervals of the other types (having all intervals ordered by start date).

So the result table should look like:

      s     |      e     |  t
------------|------------|-----
 2013-01-01 | 2013-01-02 |  3
 2013-01-02 | 2013-01-05 |  1
 2013-01-05 | 2013-01-08 |  2
 2013-01-08 | 2013-01-09 |  1

Any ideas how to do this without cursor?

I've got one working solution:

declare @periods table (
    s datetime primary key clustered, 
    e datetime,
    t tinyint,
    period_number int   
);

insert into @periods (s, e, t) values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

declare @t tinyint = null;  
declare @PeriodNumber int = 0;
declare @anchor date;

update @periods
    set  period_number = @PeriodNumber, 
    @PeriodNumber = case
                        when @t <> t
                            then  @PeriodNumber + 1
                        else
                            @PeriodNumber
                    end,
    @t = t,
    @anchor = s
option (maxdop 1);

select 
    s = min(s),
    e = max(e),
    t = min(t)
from 
    @periods    
group by 
    period_number
order by 
    s;

but I doubt if I can rely on such a behavior of UPDATE statement?

I use SQL Server 2008 R2.

Edit:

Thanks to Daniel and this article: http://www.sqlservercentral.com/articles/T-SQL/68467/

I found three important things that were missed in the solution above:

There must be clustered index on the table
There must be anchor variable and call of the clustered column
Update statement should be executed by one processor, i.e. without parallelism

I've changed the above solution in accordance with these rules.

回答1:

Since your ranges are continuous, the problem essentially becomes a gaps-and-islands one. If only you had a criterion to help you to distinguish between different sequences with the same t value, you could group all the rows using that criterion, then just take MIN(s), MAX(e) for every group.

One method of obtaining such a criterion is to use two ROW_NUMBER calls. Consider the following query:

SELECT
  *,
  rnk1 = ROW_NUMBER() OVER (               ORDER BY s),
  rnk2 = ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
FROM @periods
;

For your example it would return the following set:

s           e           t   rnk1  rnk2
----------  ----------  --  ----  ----
2013-01-01  2013-01-02  3   1     1
2013-01-02  2013-01-04  1   2     1
2013-01-04  2013-01-05  1   3     2
2013-01-05  2013-01-06  2   4     1
2013-01-06  2013-01-07  2   5     2
2013-01-07  2013-01-08  2   6     3
2013-01-08  2013-01-09  1   7     3

The interesting thing about the rnk1 and rnk2 rankings is that if you subtract one from the other, you will get values that, together with t, uniquely identify every distinct sequence of rows with the same t:

s           e           t   rnk1  rnk2  rnk1 - rnk2
----------  ----------  --  ----  ----  -----------
2013-01-01  2013-01-02  3   1     1     0
2013-01-02  2013-01-04  1   2     1     1
2013-01-04  2013-01-05  1   3     2     1
2013-01-05  2013-01-06  2   4     1     3
2013-01-06  2013-01-07  2   5     2     3
2013-01-07  2013-01-08  2   6     3     3
2013-01-08  2013-01-09  1   7     3     4

Knowing that, you can easily apply grouping and aggregation. This is what the final query might look like:

WITH partitioned AS (
  SELECT
    *,
    g = ROW_NUMBER() OVER (               ORDER BY s)
      - ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
  FROM @periods
)
SELECT
  s = MIN(s),
  e = MAX(e),
  t
FROM partitioned
GROUP BY
  t,
  g
;

If you like, you can play with this solution at SQL Fiddle.

回答2:

How about this?

declare @periods table (
    s datetime primary key, 
    e datetime,
    t tinyint,
    s2 datetime
);

insert into @periods (s, e, t) values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);

update @periods set s2 = s;

while @@ROWCOUNT > 0
begin
    update p2 SET s2=p1.s
    from @periods p1
    join @PERIODS P2 ON p2.t = p1.t AND p2.s2 = p1.e;
end

select s2 as s, max(e) as e, min(t) as t
from @periods
group by s2
order by s2;

回答3:

a possibly solution to avoid update and cursor should be using common table expressions...

like this...

    declare @periods table (
        s date, 
        e date,
        t tinyint
    );


    insert into @periods values
    ('2013-01-01' , '2013-01-02', 3),
    ('2013-01-02' , '2013-01-04', 1),
    ('2013-01-04' , '2013-01-05', 1),
    ('2013-01-05' , '2013-01-06', 2),
    ('2013-01-06' , '2013-01-07', 2),
    ('2013-01-07' , '2013-01-08', 2),
    ('2013-01-08' , '2013-01-09', 1);

    with cte as ( select 0 as n
                        ,p.s as s
                        ,p.e as e
                        ,p.t
                        ,case when p2.s is null then 1 else 0 end fl_s
                        ,case when p3.e is null then 1 else 0 end fl_e
                  from @periods p
                  left outer join @periods p2
                  on p2.e = p.s
                  and p2.t = p.t
                  left outer join @periods p3
                  on p3.s = p.e
                  and p3.t = p.t

                  union all 

                  select  n+1 as n
                        , p2.s as s
                        , p.e as e
                        ,p.t
                        ,case when not exists(select * from @periods p3 where p3.e =p2.s and p3.t=p2.t) then 1 else 0 end as fl_s
                        ,p.fl_e as fl_e
                  from cte p
                  inner join @periods p2
                  on p2.e = p.s
                  and p2.t = p.t
                  where p.fl_s = 0

                  union all 

                  select  n+1 as n
                        , p.s as s
                        , p2.e as e
                        ,p.t
                        ,p.fl_s as fl_s
                        ,case when not exists(select * from @periods p3 where p3.s =p2.e and p3.t=p2.t) then 1 else 0 end as fl_e
                  from cte p
                  inner join @periods p2
                  on p2.s = p.e
                  and p2.t = p.t
                  where p.fl_s = 1
                  and p.fl_e = 0
    )
    ,result as (select s,e,t,COUNT(*) as count_lines
                 from cte
                 where fl_e = 1
                 and fl_s = 1
                 group by s,e,t
                 )
    select * from result
    option(maxrecursion 0)

resultset achieved...

    s           e           t   count_lines
    2013-01-01  2013-01-02  3   1
    2013-01-02  2013-01-05  1   2
    2013-01-05  2013-01-08  2   3
    2013-01-08  2013-01-09  1   1

回答4:

Hooray! I've found the solution that suits me and it is done without iteration

with cte1 as (   
    select s, t  from @periods
    union all
    select max(e), null from @periods
),
cte2 as (
    select rn = row_number() over(order by s), s, t from cte1   
),
cte3 as (
    select 
        rn = row_number() over(order by a.rn),
        a.s,
        a.t 
    from 
        cte2 a 
        left join cte2 b on a.rn = b.rn + 1 and a.t = b.t
    where
        b.rn is null 
)
select 
    s = a.s, 
    e = b.s, 
    a.t  
from 
    cte3 a 
    inner join cte3 b on b.rn = a.rn + 1;

Thanks everyone for sharing your thoughts and solutions!

Details:

cte1 returns the chain of dates with the types after them:

s          t
---------- ----
2013-01-01 3
2013-01-02 1
2013-01-04 1
2013-01-05 2
2013-01-06 2
2013-01-07 2
2013-01-08 1
2013-01-09 NULL  -- there is no type *after* the last date

ct2 just add row number to the above result:

 rn       s       t
---- ----------  ----
 1    2013-01-01  3
 2    2013-01-02  1
 3    2013-01-04  1
 4    2013-01-05  2
 5    2013-01-06  2
 6    2013-01-07  2
 7    2013-01-08  1
 8    2013-01-09  NULL

if we output all the fields from the query in cte3 without where condition, we get the following results:

select * from cte2 a left join cte2 b on a.rn = b.rn + 1 and a.t = b.t;

rn       s        t       rn      s          t
---- ----------  ----    ------ ----------  ----
1    2013-01-01  3       NULL   NULL        NULL
2    2013-01-02  1       NULL   NULL        NULL
3    2013-01-04  1       2      2013-01-02  1
4    2013-01-05  2       NULL   NULL        NULL
5    2013-01-06  2       4      2013-01-05  2
6    2013-01-07  2       5      2013-01-06  2
7    2013-01-08  1       NULL   NULL        NULL
8    2013-01-09  NULL    NULL   NULL        NULL

For the dates where type is repeted there are values on the right side of the results. So we can just remove all the lines where values exist on the right side.

So cte3 returns:

rn    s           t
----- ----------  ----
1     2013-01-01  3
2     2013-01-02  1
3     2013-01-05  2
4     2013-01-08  1
5     2013-01-09  NULL

Note that because of the removal some rows there are some gaps in rn sequence, so we have to renumber them again.

From here only one thing left - to transform the dates to periods:

select 
    s = a.s, 
    e = b.s, 
    a.t  
from 
    cte3 a 
    inner join cte3 b on b.rn = a.rn + 1;

and we've got the required result:

s          e          t
---------- ---------- ----
2013-01-01 2013-01-02 3
2013-01-02 2013-01-05 1
2013-01-05 2013-01-08 2
2013-01-08 2013-01-09 1

回答5:

this is your solution with a different data on the table..

    declare @periods table (
        s datetime primary key, 
        e datetime,
        t tinyint,
        period_number int   
    );

    insert into @periods (s, e, t) values
    ('2013-01-01' , '2013-01-02', 3),
    ('2013-01-02' , '2013-01-04', 1),
    ('2013-01-04' , '2013-01-05', 1),
    ('2013-01-05' , '2013-01-06', 2),
    ('2013-01-09' , '2013-01-10', 2),
    ('2013-01-10' , '2013-01-11', 1);

    declare @t tinyint = null;  
    declare @PeriodNumber int = 0;

    update @periods
        set  period_number = @PeriodNumber, 
        @PeriodNumber = case
                            when @t <> t
                                then  @PeriodNumber + 1
                            else
                                @PeriodNumber
                        end,
        @t = t;

    select 
        s = min(s),
        e = max(e),
        t = min(t)
    from 
        @periods    
    group by 
        period_number
    order by 
        s;

where have a gap between

 ('2013-01-05' , '2013-01-06', 2),
 --and
 ('2013-01-09' , '2013-01-10', 2),

your solution resultset is..

    s           e           t
    2013-01-01  2013-01-02  3
    2013-01-02  2013-01-05  1
    2013-01-05  2013-01-10  2
    2013-01-10  2013-01-11  1

isnt was spected the resultset like this..??

    s           e           t
    2013-01-01  2013-01-02  3
    2013-01-02  2013-01-05  1
    2013-01-05  2013-01-06  2
    2013-01-09  2013-01-10  2
    2013-01-10  2013-01-11  1

maybe I did misunderstood the rule of your problem...