I have a large data set which for the purpose of this question has 3 fields:
- Group Identifier
- From Date
- To Date
On any given row the From Date
will always be less than the To Date
but within each group the time periods (which are in no particular order) represented by the date pairs could overlap, be contained one within another, or even be identical.
What I'd like to end up with is a query that condenses the results for each group down to just the continuous periods. For example a group that looks like this:
| Group ID | From Date | To Date |
--------------------------------------
| A | 01/01/2012 | 12/31/2012 |
| A | 12/01/2013 | 11/30/2014 |
| A | 01/01/2015 | 12/31/2015 |
| A | 01/01/2015 | 12/31/2015 |
| A | 02/01/2015 | 03/31/2015 |
| A | 01/01/2013 | 12/31/2013 |
Would result in this:
| Group ID | From Date | To Date |
--------------------------------------
| A | 01/01/2012 | 11/30/2014 |
| A | 01/01/2015 | 12/31/2015 |
I've read a number of articles on date packing but I can't quite figure out how to apply that to my data set.
How can construct a query that would give me those results?
The solution from book "Microsoft® SQL Server ® 2012 High-Performance T-SQL Using Window Functions"
Create table:
I'd use a
Calendar
table. This table simply has a list of dates for several decades.There are many ways to populate such table.
For example, 100K rows (~270 years) from 1900-01-01:
Once you have a
Calendar
table, here is how to use it.Each original row is joined with the
Calendar
table to return as many rows as there are dates between From and To.Then possible duplicates are removed.
Then classic gaps-and-islands by numbering the rows in two sequences.
Then grouping found islands together to get the new From and To.
Sample data
I added a second group.
Query
Result
A Geometric Approach
Here and elsewhere I've noticed that date packing questions don't provide a geometric approach to this problem. After all, any range, date-ranges included, can be interpreted as a line. So why not convert them to a sql geometry type and utilize
geometry::UnionAggregate
to merge the ranges. So I gave a stab at it with your post.Code Description
In 'numbers':
In 'mergeLines':
In the outer query:
The Code