How to identify the first gap in multiple start an

2019-06-20 17:42发布

问题:

I have been working on the below but getting no results and the deadline is fast approaching. Also, there are over a million rows as the below. Appreciate your help on the below.

Objective: Group results by MEMBER and build Continuous Coverage Ranges for each Member by combining individual Date Ranges which either overlap or run consecutive to each other with no breaks between the Start & End day of the range.

I have data in the below format:

MemberCode  -----   ClaimID   -----       StartDate   -----       EndDate
00001   -----       012345   -----       2010-01-15   -----       2010-01-20
00001   -----       012350   -----       2010-01-19   -----       2010-01-22
00001   -----       012352   -----       2010-01-20   -----       2010-01-25
00001   -----       012355   -----       2010-01-26   -----       2010-01-30
00002   -----       012357   -----       2010-01-20   -----       2010-01-25
00002   -----       012359   -----       2010-01-30   -----       2010-02-05
00002   -----       012360   -----       2010-02-04   -----       2010-02-15
00003   -----       012365   -----       2010-02-15   -----       2010-02-30

...

In the above the member (00001) is a valid member as there is a continuous date range from 2010-01-15 to 2010-01-30 (with no gaps). Please note that the Claim ID 012355 for this member starts immediately next to the End Date of Claim ID 012352. This is still valid as it forms a continuous range.

However, the member (00002) should be an Invalid member as there is a gap of 5 days between Enddate of Claim ID 012357 and Start Day for Claim ID 012359

What I am trying to do is get a list of ONLY those members who have claims for every single day of the continuous date range (for each member) with no gaps between the MIN(Start-date) and Max(End Date) for each Distinct member. Members who have gaps are discarded.

Thanks in advance.

UPDATE:

I have reached until here. Note: FILLED_DT = Start Date & PresCoverEndDT = End Date

SELECT PresCoverEndDT, FILLED_DT 

FROM 

(

    SELECT DISTINCT FILLED_DT, ROW_NUMBER() OVER (ORDER BY FILLED_DT) RN

    FROM Temp_Claims_PRIOR_STEP_5 T1

    WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

            WHERE T1.FILLED_DT > T2.FILLED_DT AND T1.FILLED_DT< T2.PresCoverEndDT 

            AND T1.MBR_KEY = T2.MBR_KEY )

) T1

    JOIN (SELECT DISTINCT PresCoverEndDT, ROW_NUMBER() OVER (ORDER BY PresCoverEndDT) RN

        FROM Temp_Claims_PRIOR_STEP_5 T1

        WHERE NOT EXISTS 

            (SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2

             WHERE T1.PresCoverEndDT > T2.FILLED_DT AND T1.PresCoverEndDT < T2.PresCoverEndDT AND T1.MBR_KEY = T2.MBR_KEY )
) T2

     ON T1.RN - 1 = T2.RN

WHERE   PresCoverEndDT < FILLED_DT 

The above code seems to have error as I am getting only one row and that too it is incorrect. My desired output is only 1 column as below:

Valid_Member_Code

00001

00007

00009

... etc.,

回答1:

Try this: http://www.sqlfiddle.com/#!3/c3365/20

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);

See query progression here: http://www.sqlfiddle.com/#!3/c3365/20


How it works, compare the current end date to its next start date and check the date gap:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1;

Output:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE | GAP |
--------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |  -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |  -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |   1 |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |   5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |  -1 |

Then check if a member has same count of claims with no gaps to its total claims:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode, count(*) as count, sum(case when gap <= 1 then 1 end) as gapless_count
from gaps
group by membercode;

Output:

| MEMBERCODE | COUNT | GAPLESS_COUNT |
--------------------------------------
|          1 |     3 |             3 |
|          2 |     2 |             1 |

Finally, filter them, members with no gaps in their claims:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);

Output:

| MEMBERCODE |
--------------
|          1 |

Do note that you don't need to do COUNT(*) > 1 to detect members with 2 or more claims. Instead of using LEFT JOIN, we uses JOIN, this will automatically discard members who have yet to have a second claim. Here's the version(longer) if you opt to use LEFT JOIN instead(same output as above):

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
and count(*) > 1; -- members who have two ore more claims only

Here's how see data of above query prior to filtering:

with s as
(
  select *, row_number() over(partition by membercode order by startdate) rn
  from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
  ,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select * from gaps;

Output:

| MEMBERCODE |  STARTDATE |    ENDDATE | NEXTSTARTDATE |    GAP |
-----------------------------------------------------------------
|          1 | 2010-01-15 | 2010-01-20 |    2010-01-19 |     -1 |
|          1 | 2010-01-19 | 2010-01-22 |    2010-01-20 |     -2 |
|          1 | 2010-01-20 | 2010-01-25 |    2010-01-26 |      1 |
|          1 | 2010-01-26 | 2010-01-30 |        (null) | (null) |
|          2 | 2010-01-20 | 2010-01-25 |    2010-01-30 |      5 |
|          2 | 2010-01-30 | 2010-02-05 |    2010-02-04 |     -1 |
|          2 | 2010-02-04 | 2010-02-15 |        (null) | (null) |
|          3 | 2010-02-15 | 2010-03-02 |        (null) | (null) |

EDIT on requirement clarification:

On your clarification, you wanted to include members who have yet to have second claim too, do this instead: http://sqlfiddle.com/#!3/c3365/22

with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode 
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
-- members who have yet to have a second claim are valid too
or count(nextstartdate) = 0; 

Output:

| MEMBERCODE |
--------------
|          1 |
|          3 |

The technique is to count the member's nextstartdate, if they have no next start date date(i.e. count(nextstartdate) = 0) then they are single claims only and valid too, then just attach this OR condition:

or count(nextstartdate) = 0; 

Actually, the condition below will suffice too, I wanted to make the query more self-documenting though, hence I recommend counting on member's nextstartdate. Here's an alternative condition for counting members who have yet to have a second claim:

or count(*) = 1;

Btw, we also have to change the comparison from this:

sum(case when gap <= 1 then 1 end) = count(*)

to this(as we are using LEFT JOIN now):

sum(case when gap <= 1 then 1 end) = count(gap)


回答2:

Try this, it partitions rows by MemberCode and gives them ordinal numbers. Then it compares rows with subsequent num value, if difference between end date of a row and start date of a next row is greater than one day, it's an invalid member:

DECLARE @t TABLE (MemberCode  VARCHAR(100), ClaimID   
    INT,StartDate   DATETIME,EndDate DATETIME)
INSERT @t
VALUES
('00001'   ,       012345   ,        '2010-01-15'   ,       '2010-01-20')
,('00001'   ,       012350   ,       '2010-01-19'   ,       '2010-01-22')
,('00001'   ,       012352   ,       '2010-01-20'   ,       '2010-01-25')
,('00001'   ,       012355   ,       '2010-01-26'   ,       '2010-01-30')
,('00002'   ,       012357   ,       '2010-01-20'   ,       '2010-01-25')
,('00002'   ,       012359   ,       '2010-01-30'   ,       '2010-02-05')
,('00002'   ,       012360   ,       '2010-02-04'   ,       '2010-02-15')
,('00003'   ,       012365   ,       '2010-02-15'   ,       '2010-02-28')
,('00004'   ,       012366   ,       '2010-03-18'   ,       '2010-03-23')
,('00005'   ,       012367   ,       '2010-03-19'   ,       '2010-03-25')
,('00006'   ,       012368   ,       '2010-03-20'   ,       '2010-03-21')

;WITH tbl AS (

    SELECT  *,
            ROW_NUMBER() OVER (PARTITION BY MemberCode ORDER BY StartDate) 
                AS num
    FROM    @t
), invalid AS (

    SELECT  tbl.MemberCode
    FROM    tbl
    JOIN    tbl _tbl ON 
            tbl.num = _tbl.num - 1
    AND     tbl.MemberCode = _tbl.MemberCode
    WHERE   DATEDIFF(DAY, tbl.EndDate, _tbl.StartDate) > 1  
)

SELECT  MemberCode
FROM    tbl
EXCEPT
SELECT  MemberCode
FROM    invalid


回答3:

I think your query gives back false negatives because it only checks the time interval between consecutive rows. In my opinion, it is possible that the gap is compensated by one of the previous lines. Let me give an example:

Row l: 2010-01-01 | 2010-01-31
Row 2: 2010-01-10 | 2010-01-15
Row 3: 2010-01-20 | 2010-01-25

Your code will report a gap between row 2 and row 3, while it is being filled by row 1. Your code will not detect this. You should use the MAX(EndDate) of all previous rows in the DATEDIFF function.

DECLARE @t TABLE (PersonID  VARCHAR(100), StartDate DATETIME, EndDate DATETIME)
INSERT @t VALUES('00001'   ,       '2010-01-01'   ,       '2010-01-17')
INSERT @t VALUES('00001'   ,       '2010-01-19'   ,       '2010-01-22')
INSERT @t VALUES('00001'   ,       '2010-01-20'   ,       '2010-01-25')
INSERT @t VALUES('00001'   ,       '2010-01-26'   ,       '2010-01-31')
INSERT @t VALUES('00002'   ,       '2010-01-20'   ,       '2010-01-25')
INSERT @t VALUES('00002'   ,       '2010-02-04'   ,       '2010-02-05')
INSERT @t VALUES('00002'   ,       '2010-02-04'   ,       '2010-02-15')
INSERT @t VALUES('00003'   ,       '2010-02-15'   ,       '2010-02-28')
INSERT @t VALUES('00004'   ,       '2010-03-18'   ,       '2010-03-23')
INSERT @t VALUES('00005'   ,       '2010-03-19'   ,       '2010-03-25')
INSERT @t VALUES('00006'   ,       '2010-01-01'   ,       '2010-04-20')
INSERT @t VALUES('00006'   ,       '2010-01-20'   ,       '2010-01-21')
INSERT @t VALUES('00006'   ,       '2010-01-25'   ,       '2010-01-26')

;WITH tbl AS (
    SELECT  
        *, ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY StartDate) AS num
    FROM    @t
), invalid AS (
    SELECT  tbl.PersonID 
    FROM    tbl
    JOIN    tbl _tbl ON 
            tbl.num = _tbl.num - 1 AND tbl.PersonID = _tbl.PersonID
    WHERE DATEDIFF(DAY, (SELECT MAX(tbl3.EndDate) FROM tbl tbl3 WHERE tbl3.num <= tbl.num AND tbl3.PersonID = tbl.PersonID), _tbl.StartDate) > 1  
)

SELECT  PersonID
FROM    tbl
EXCEPT
SELECT  PersonID
FROM    invalid