Group Non-Contiguous Dates By Criteria In Column

2019-07-30 04:21发布

问题:

I have a table with start and end dates for team consultations with customers.

I need to merge certain consultations based on a number of days specified in another column (sometimes the consultations may overlap, sometimes they are contiguous, sometimes they arent), Team and Type.

Some example data is as follows:

DECLARE @TempTable TABLE([CUSTOMER_ID] INT
                        ,[TEAM] VARCHAR(1)
                        ,[TYPE] VARCHAR(1)
                        ,[START_DATE] DATETIME
                        ,[END_DATE] DATETIME
                        ,[GROUP_DAYS_CRITERIA] INT)

INSERT INTO @TempTable VALUES (1,'A','A','2013-08-07','2013-12-31',28)
                             ,(2,'B','A','2015-05-15','2015-05-28',28)
                             ,(2,'B','A','2015-05-15','2016-05-12',28)
                             ,(2,'B','A','2015-05-28','2015-05-28',28)
                             ,(3,'C','A','2013-05-27','2014-07-23',28)
                             ,(3,'C','A','2015-01-12','2015-05-28',28)
                             ,(3,'B','A','2015-01-12','2015-05-28',28)
                             ,(3,'C','A','2015-05-28','2015-05-28',28)
                             ,(3,'C','A','2015-05-28','2015-12-17',28)
                             ,(4,'A','B','2013-07-09','2014-04-21',7)
                             ,(4,'A','B','2014-04-29','2014-08-01',7)

Which looks like this:

+-------------+------+------+------------+------------+---------------------+
| CUSTOMER_ID | TEAM | TYPE | START_DATE |  END_DATE  | GROUP_DAYS_CRITERIA |
+-------------+------+------+------------+------------+---------------------+
|           1 | A    | A    | 07/08/2013 | 31/12/2013 |                  28 |
|           2 | B    | A    | 15/05/2015 | 28/05/2015 |                  28 |
|           2 | B    | A    | 15/05/2015 | 12/05/2016 |                  28 |
|           2 | B    | A    | 28/05/2015 | 28/05/2015 |                  28 |
|           3 | C    | A    | 27/05/2013 | 23/07/2014 |                  28 |
|           3 | C    | A    | 12/01/2015 | 28/05/2015 |                  28 |
|           3 | B    | A    | 12/01/2015 | 28/05/2015 |                  28 |
|           3 | C    | A    | 28/05/2015 | 28/05/2015 |                  28 |
|           3 | C    | A    | 28/05/2015 | 17/12/2015 |                  28 |
|           4 | A    | B    | 09/07/2013 | 21/04/2014 |                   7 |
|           4 | A    | B    | 29/04/2014 | 01/08/2014 |                   7 |
+-------------+------+------+------------+------------+---------------------+

My desired output is as follows:

+-------------+------+------+------------+------------+---------------------+
| CUSTOMER_ID | TEAM | TYPE | START_DATE |  END_DATE  | GROUP_DAYS_CRITERIA |
+-------------+------+------+------------+------------+---------------------+
|           1 | A    | A    | 07/08/2013 | 31/12/2013 |                  28 |
|           2 | B    | A    | 15/05/2015 | 12/05/2016 |                  28 |
|           3 | C    | A    | 27/05/2013 | 23/07/2014 |                  28 |
|           3 | C    | A    | 12/01/2015 | 17/12/2015 |                  28 |
|           3 | B    | A    | 12/01/2015 | 28/05/2015 |                  28 |
|           4 | A    | B    | 09/07/2013 | 21/04/2014 |                   7 |
|           4 | A    | B    | 29/04/2014 | 01/08/2014 |                   7 |
+-------------+------+------+------------+------------+---------------------+

I am struggling to do this at all, let alone with any efficiency! Any ideas / code will be greatly received.

Server version is MS SQL Server 2014

Thanks,

Dan

回答1:

If I am understanding your question correctly, we want to return rows only when a second, third, etc consultation has not occurred within group_days_criteria number of days after the previous consultation end date.

We can get the previous consultation end date and eliminate rows (since we are not concerned with the number of consultations) where a consultation occurred for the same customer by the same team and of the same consultation type within our date range.

DECLARE @TempTable TABLE([CUSTOMER_ID] INT
                    ,[TEAM] VARCHAR(1)
                    ,[TYPE] VARCHAR(1)
                    ,[START_DATE] DATETIME
                    ,[END_DATE] DATETIME
                    ,[GROUP_DAYS_CRITERIA] INT)

INSERT INTO @TempTable VALUES (1,'A','A','2013-08-07','2013-12-31',28)
                         ,(2,'B','A','2015-05-15','2015-05-28',28)
                         ,(2,'B','A','2015-05-15','2016-05-12',28)
                         ,(2,'B','A','2015-05-28','2015-05-28',28)
                         ,(3,'C','A','2013-05-27','2014-07-23',28)
                         ,(3,'C','A','2015-01-12','2015-05-28',28)
                         ,(3,'B','A','2015-01-12','2015-05-28',28)
                         ,(3,'C','A','2015-05-28','2015-05-28',28)
                         ,(3,'C','A','2015-05-28','2015-12-17',28)
                         ,(4,'A','B','2013-07-09','2014-04-21',7)
                         ,(4,'A','B','2014-04-29','2014-08-01',7)

;with prep as (
select  Customer_ID,
        Team,
        [Type],
        [Start_Date],
        [End_Date],
        Group_Days_Criteria,
        ROW_NUMBER() over (partition by customer_id, team, [type] order by [start_date] asc, [end_date] desc) as rn, -- earliest start date with latest end date
        lag([End_Date] + Group_Days_Criteria, 1, 0) over (partition by customer_id, team, [type] order by [start_date] asc, [end_date] desc) as PreviousEndDate -- previous end date +
from @TempTable
)

select  p.Customer_Id,
        p.[Team],
        p.[Type],
        p.[Start_Date],
        p.[End_Date],
        p.Group_Days_Criteria
from prep p
where p.rn = 1 
    or (p.rn != 1 and p.[Start_date] > p.PreviousEndDate)
order by p.Customer_Id, p.[Team], p.[Start_Date], p.[Type]

This returned the desired result set.