I have a table with the following data:
If the dates overlap for a Type, I want to return a separate row for the overlapping period so end up with the following results:
I have a table with the following data:
If the dates overlap for a Type, I want to return a separate row for the overlapping period so end up with the following results:
Consider treating the ranges as geometric lines and then using SQL Server's geometry tools.
First, I need to get your data into a table I can work with:
declare @spans table (
type char(1),
start datetime,
stop datetime
);
insert @spans values
('S', '2010-01-01', '2010-01-14'),
('S', '2010-01-10', '2010-01-31'),
('A', '2010-01-05', '2010-01-30'),
('A', '2010-01-24', '2010-02-06'),
('T', '2010-01-20', '2010-01-27'),
('T', '2010-01-28', '2010-01-30');
Unfortunately, SQL Server's spatial tools are limited on one respect. When you need to query individual shapes or points inside a 'geometry' value, you have to pass the index number to get it. Until microsoft comes out with a TVF that outputs all the shapes, we resort to a numbers table to get the job done.
Swap this out with your preferred method to create a numbers table:
declare @numbers table (i int);
insert @numbers
select i = row_number() over (order by (select null))
from @spans a, @spans b;
Now you're ready for the main query.
In 'geoAggregates':
In the outer query:
Here' the code. It outputs as you expect, except for your type-o in the second row of 'A'.
with
geoAggregates as (
select type,
lines = geometry::UnionAggregate(line),
splitters = geometry::UnionAggregate(splitters)
from @spans
cross apply (select
startF = convert(float, start),
stopF = convert(float, stop) + 1
) prepare
cross apply (select
startP = geometry::Point(startF, 0, 0),
stopP = geometry::Point(stopF, 0, 0)
) pointify
cross apply (select
line = startP.STUnion(stopP).STEnvelope(),
splitters = startP.STUnion(stopP)
) lineify
group by type
)
select type,
start,
stop
from geoAggregates ga
cross apply (select
splitted = ga.lines.STDifference(splitters.STBuffer(0.001))
) sp
join @numbers n on n.i between 1 and sp.splitted.STNumGeometries()
cross apply (select
line = sp.splitted.STGeometryN(i).STEnvelope()
) l
cross apply (select
start = convert(datetime, round(l.line.STPointN(1).STX,0)),
stop = convert(datetime, round(l.line.STPointN(3).STX - 1,0))
) dateify
order by type, start;
I'm ignoring PKey
for now since I'm not sure if it's actually relevant to the question.
This solves the problem:
declare @t table (PKey int,Start date,[End] date,Type char(1))
insert into @t(PKey,Start,[End],Type) values
(1,'20100101','20100114','S'),
(2,'20100110','20100131','S'),
(3,'20100105','20100130','A'),
(4,'20100124','20100206','A'),
(5,'20100120','20100127','T'),
(6,'20100128','20100130','T')
;With EndDates as (
select [End],Type from @t
union all
select DATEADD(day,-1,Start),Type from @t
), Periods as (
select Type,MIN(Start) as Start,
(select MIN([End]) from EndDates e
where e.Type = t.Type and
e.[End] >= MIN(Start)) as [End]
from
@t t
group by Type
union all
select p.Type,DATEADD(day,1,p.[End]),e.[End]
from
Periods p
inner join
EndDates e
on
p.Type = e.Type and
p.[End] < e.[End]
where
not exists (select * from EndDates e2 where
e2.Type = p.Type and
e2.[End] > p.[End] and
e2.[End] < e.[End])
)
select * from Periods
order by Type,Start
First, we create a CTE called EndDates
that contains all dates which might be the end of a period - these are either the end dates that we already have in our data, or they're the day before one of the start dates in our data.
We then build up the periods - first we find the first period for any particular type - which is we take the earliest start date, and the earliest possible end date that comes after the start date.
Then, recursively, we build up additional periods by starting new periods the day after existing ones have ended, and finding the earliest end date that follows that date.
And then, basically, we're done. Results:
Type Start End
---- ---------- ----------
A 2010-01-05 2010-01-23
A 2010-01-24 2010-01-30
A 2010-01-31 2010-02-06
S 2010-01-01 2010-01-09
S 2010-01-10 2010-01-14
S 2010-01-15 2010-01-31
T 2010-01-20 2010-01-27
T 2010-01-28 2010-01-30
Which doesn't exactly match what's in your question but I assume the A
row ending on 30th February was a typo.
(I'd recommend renaming your End
column though, because using reserved words for column names can turn into a real pain)