可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Say we have such a table:
declare @periods table (
s date,
e date,
t tinyint
);
with date intervals without gaps ordered by start date (s)
insert into @periods values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);
All date intervals have different types (t).
It is required to combine date intervals of the same type where they are not broken by intervals of the other types (having all intervals ordered by start date).
So the result table should look like:
s | e | t
------------|------------|-----
2013-01-01 | 2013-01-02 | 3
2013-01-02 | 2013-01-05 | 1
2013-01-05 | 2013-01-08 | 2
2013-01-08 | 2013-01-09 | 1
Any ideas how to do this without cursor?
I've got one working solution:
declare @periods table (
s datetime primary key clustered,
e datetime,
t tinyint,
period_number int
);
insert into @periods (s, e, t) values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);
declare @t tinyint = null;
declare @PeriodNumber int = 0;
declare @anchor date;
update @periods
set period_number = @PeriodNumber,
@PeriodNumber = case
when @t <> t
then @PeriodNumber + 1
else
@PeriodNumber
end,
@t = t,
@anchor = s
option (maxdop 1);
select
s = min(s),
e = max(e),
t = min(t)
from
@periods
group by
period_number
order by
s;
but I doubt if I can rely on such a behavior of UPDATE statement?
I use SQL Server 2008 R2.
Edit:
Thanks to Daniel and this article: http://www.sqlservercentral.com/articles/T-SQL/68467/
I found three important things that were missed in the solution above:
- There must be clustered index on the table
- There must be anchor variable and call of the clustered column
- Update statement should be executed by one processor, i.e. without parallelism
I've changed the above solution in accordance with these rules.
回答1:
Since your ranges are continuous, the problem essentially becomes a gaps-and-islands one. If only you had a criterion to help you to distinguish between different sequences with the same t
value, you could group all the rows using that criterion, then just take MIN(s), MAX(e)
for every group.
One method of obtaining such a criterion is to use two ROW_NUMBER
calls. Consider the following query:
SELECT
*,
rnk1 = ROW_NUMBER() OVER ( ORDER BY s),
rnk2 = ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
FROM @periods
;
For your example it would return the following set:
s e t rnk1 rnk2
---------- ---------- -- ---- ----
2013-01-01 2013-01-02 3 1 1
2013-01-02 2013-01-04 1 2 1
2013-01-04 2013-01-05 1 3 2
2013-01-05 2013-01-06 2 4 1
2013-01-06 2013-01-07 2 5 2
2013-01-07 2013-01-08 2 6 3
2013-01-08 2013-01-09 1 7 3
The interesting thing about the rnk1
and rnk2
rankings is that if you subtract one from the other, you will get values that, together with t
, uniquely identify every distinct sequence of rows with the same t
:
s e t rnk1 rnk2 rnk1 - rnk2
---------- ---------- -- ---- ---- -----------
2013-01-01 2013-01-02 3 1 1 0
2013-01-02 2013-01-04 1 2 1 1
2013-01-04 2013-01-05 1 3 2 1
2013-01-05 2013-01-06 2 4 1 3
2013-01-06 2013-01-07 2 5 2 3
2013-01-07 2013-01-08 2 6 3 3
2013-01-08 2013-01-09 1 7 3 4
Knowing that, you can easily apply grouping and aggregation. This is what the final query might look like:
WITH partitioned AS (
SELECT
*,
g = ROW_NUMBER() OVER ( ORDER BY s)
- ROW_NUMBER() OVER (PARTITION BY t ORDER BY s)
FROM @periods
)
SELECT
s = MIN(s),
e = MAX(e),
t
FROM partitioned
GROUP BY
t,
g
;
If you like, you can play with this solution at SQL Fiddle.
回答2:
How about this?
declare @periods table (
s datetime primary key,
e datetime,
t tinyint,
s2 datetime
);
insert into @periods (s, e, t) values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);
update @periods set s2 = s;
while @@ROWCOUNT > 0
begin
update p2 SET s2=p1.s
from @periods p1
join @PERIODS P2 ON p2.t = p1.t AND p2.s2 = p1.e;
end
select s2 as s, max(e) as e, min(t) as t
from @periods
group by s2
order by s2;
回答3:
a possibly solution to avoid update and cursor should be using common table expressions...
like this...
declare @periods table (
s date,
e date,
t tinyint
);
insert into @periods values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-06' , '2013-01-07', 2),
('2013-01-07' , '2013-01-08', 2),
('2013-01-08' , '2013-01-09', 1);
with cte as ( select 0 as n
,p.s as s
,p.e as e
,p.t
,case when p2.s is null then 1 else 0 end fl_s
,case when p3.e is null then 1 else 0 end fl_e
from @periods p
left outer join @periods p2
on p2.e = p.s
and p2.t = p.t
left outer join @periods p3
on p3.s = p.e
and p3.t = p.t
union all
select n+1 as n
, p2.s as s
, p.e as e
,p.t
,case when not exists(select * from @periods p3 where p3.e =p2.s and p3.t=p2.t) then 1 else 0 end as fl_s
,p.fl_e as fl_e
from cte p
inner join @periods p2
on p2.e = p.s
and p2.t = p.t
where p.fl_s = 0
union all
select n+1 as n
, p.s as s
, p2.e as e
,p.t
,p.fl_s as fl_s
,case when not exists(select * from @periods p3 where p3.s =p2.e and p3.t=p2.t) then 1 else 0 end as fl_e
from cte p
inner join @periods p2
on p2.s = p.e
and p2.t = p.t
where p.fl_s = 1
and p.fl_e = 0
)
,result as (select s,e,t,COUNT(*) as count_lines
from cte
where fl_e = 1
and fl_s = 1
group by s,e,t
)
select * from result
option(maxrecursion 0)
resultset achieved...
s e t count_lines
2013-01-01 2013-01-02 3 1
2013-01-02 2013-01-05 1 2
2013-01-05 2013-01-08 2 3
2013-01-08 2013-01-09 1 1
回答4:
Hooray! I've found the solution that suits me and it is done without iteration
with cte1 as (
select s, t from @periods
union all
select max(e), null from @periods
),
cte2 as (
select rn = row_number() over(order by s), s, t from cte1
),
cte3 as (
select
rn = row_number() over(order by a.rn),
a.s,
a.t
from
cte2 a
left join cte2 b on a.rn = b.rn + 1 and a.t = b.t
where
b.rn is null
)
select
s = a.s,
e = b.s,
a.t
from
cte3 a
inner join cte3 b on b.rn = a.rn + 1;
Thanks everyone for sharing your thoughts and solutions!
Details:
cte1 returns the chain of dates with the types after them:
s t
---------- ----
2013-01-01 3
2013-01-02 1
2013-01-04 1
2013-01-05 2
2013-01-06 2
2013-01-07 2
2013-01-08 1
2013-01-09 NULL -- there is no type *after* the last date
ct2 just add row number to the above result:
rn s t
---- ---------- ----
1 2013-01-01 3
2 2013-01-02 1
3 2013-01-04 1
4 2013-01-05 2
5 2013-01-06 2
6 2013-01-07 2
7 2013-01-08 1
8 2013-01-09 NULL
if we output all the fields from the query in cte3 without where condition, we get the following results:
select * from cte2 a left join cte2 b on a.rn = b.rn + 1 and a.t = b.t;
rn s t rn s t
---- ---------- ---- ------ ---------- ----
1 2013-01-01 3 NULL NULL NULL
2 2013-01-02 1 NULL NULL NULL
3 2013-01-04 1 2 2013-01-02 1
4 2013-01-05 2 NULL NULL NULL
5 2013-01-06 2 4 2013-01-05 2
6 2013-01-07 2 5 2013-01-06 2
7 2013-01-08 1 NULL NULL NULL
8 2013-01-09 NULL NULL NULL NULL
For the dates where type is repeted there are values on the right side of the results. So we can just remove all the lines where values exist on the right side.
So cte3 returns:
rn s t
----- ---------- ----
1 2013-01-01 3
2 2013-01-02 1
3 2013-01-05 2
4 2013-01-08 1
5 2013-01-09 NULL
Note that because of the removal some rows there are some gaps in rn sequence, so we have to renumber them again.
From here only one thing left - to transform the dates to periods:
select
s = a.s,
e = b.s,
a.t
from
cte3 a
inner join cte3 b on b.rn = a.rn + 1;
and we've got the required result:
s e t
---------- ---------- ----
2013-01-01 2013-01-02 3
2013-01-02 2013-01-05 1
2013-01-05 2013-01-08 2
2013-01-08 2013-01-09 1
回答5:
this is your solution with a different data on the table..
declare @periods table (
s datetime primary key,
e datetime,
t tinyint,
period_number int
);
insert into @periods (s, e, t) values
('2013-01-01' , '2013-01-02', 3),
('2013-01-02' , '2013-01-04', 1),
('2013-01-04' , '2013-01-05', 1),
('2013-01-05' , '2013-01-06', 2),
('2013-01-09' , '2013-01-10', 2),
('2013-01-10' , '2013-01-11', 1);
declare @t tinyint = null;
declare @PeriodNumber int = 0;
update @periods
set period_number = @PeriodNumber,
@PeriodNumber = case
when @t <> t
then @PeriodNumber + 1
else
@PeriodNumber
end,
@t = t;
select
s = min(s),
e = max(e),
t = min(t)
from
@periods
group by
period_number
order by
s;
where have a gap between
('2013-01-05' , '2013-01-06', 2),
--and
('2013-01-09' , '2013-01-10', 2),
your solution resultset is..
s e t
2013-01-01 2013-01-02 3
2013-01-02 2013-01-05 1
2013-01-05 2013-01-10 2
2013-01-10 2013-01-11 1
isnt was spected the resultset like this..??
s e t
2013-01-01 2013-01-02 3
2013-01-02 2013-01-05 1
2013-01-05 2013-01-06 2
2013-01-09 2013-01-10 2
2013-01-10 2013-01-11 1
maybe I did misunderstood the rule of your problem...