I have been working on the below but getting no results and the deadline is fast approaching. Also, there are over a million rows as the below. Appreciate your help on the below.
Objective: Group results by MEMBER and build Continuous Coverage Ranges for each Member by combining individual Date Ranges which either overlap or run consecutive to each other with no breaks between the Start & End day of the range.
I have data in the below format:
MemberCode ----- ClaimID ----- StartDate ----- EndDate
00001 ----- 012345 ----- 2010-01-15 ----- 2010-01-20
00001 ----- 012350 ----- 2010-01-19 ----- 2010-01-22
00001 ----- 012352 ----- 2010-01-20 ----- 2010-01-25
00001 ----- 012355 ----- 2010-01-26 ----- 2010-01-30
00002 ----- 012357 ----- 2010-01-20 ----- 2010-01-25
00002 ----- 012359 ----- 2010-01-30 ----- 2010-02-05
00002 ----- 012360 ----- 2010-02-04 ----- 2010-02-15
00003 ----- 012365 ----- 2010-02-15 ----- 2010-02-30
...
In the above the member (00001) is a valid member as there is a continuous date range from 2010-01-15 to 2010-01-30 (with no gaps). Please note that the Claim ID 012355 for this member starts immediately next to the End Date of Claim ID 012352. This is still valid as it forms a continuous range.
However, the member (00002) should be an Invalid member as there is a gap of 5 days between Enddate of Claim ID 012357 and Start Day for Claim ID 012359
What I am trying to do is get a list of ONLY those members who have claims for every single day of the continuous date range (for each member) with no gaps between the MIN(Start-date) and Max(End Date) for each Distinct member. Members who have gaps are discarded.
Thanks in advance.
UPDATE:
I have reached until here.
Note: FILLED_DT = Start Date & PresCoverEndDT = End Date
SELECT PresCoverEndDT, FILLED_DT
FROM
(
SELECT DISTINCT FILLED_DT, ROW_NUMBER() OVER (ORDER BY FILLED_DT) RN
FROM Temp_Claims_PRIOR_STEP_5 T1
WHERE NOT EXISTS
(SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2
WHERE T1.FILLED_DT > T2.FILLED_DT AND T1.FILLED_DT< T2.PresCoverEndDT
AND T1.MBR_KEY = T2.MBR_KEY )
) T1
JOIN (SELECT DISTINCT PresCoverEndDT, ROW_NUMBER() OVER (ORDER BY PresCoverEndDT) RN
FROM Temp_Claims_PRIOR_STEP_5 T1
WHERE NOT EXISTS
(SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2
WHERE T1.PresCoverEndDT > T2.FILLED_DT AND T1.PresCoverEndDT < T2.PresCoverEndDT AND T1.MBR_KEY = T2.MBR_KEY )
) T2
ON T1.RN - 1 = T2.RN
WHERE PresCoverEndDT < FILLED_DT
The above code seems to have error as I am getting only one row and that too it is incorrect. My desired output is only 1 column as below:
Valid_Member_Code
00001
00007
00009
... etc.,
Try this: http://www.sqlfiddle.com/#!3/c3365/20
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);
See query progression here: http://www.sqlfiddle.com/#!3/c3365/20
How it works, compare the current end date to its next start date and check the date gap:
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1;
Output:
| MEMBERCODE | STARTDATE | ENDDATE | NEXTSTARTDATE | GAP |
--------------------------------------------------------------
| 1 | 2010-01-15 | 2010-01-20 | 2010-01-19 | -1 |
| 1 | 2010-01-19 | 2010-01-22 | 2010-01-20 | -2 |
| 1 | 2010-01-20 | 2010-01-25 | 2010-01-26 | 1 |
| 2 | 2010-01-20 | 2010-01-25 | 2010-01-30 | 5 |
| 2 | 2010-01-30 | 2010-02-05 | 2010-02-04 | -1 |
Then check if a member has same count of claims with no gaps to its total claims:
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode, count(*) as count, sum(case when gap <= 1 then 1 end) as gapless_count
from gaps
group by membercode;
Output:
| MEMBERCODE | COUNT | GAPLESS_COUNT |
--------------------------------------
| 1 | 3 | 3 |
| 2 | 2 | 1 |
Finally, filter them, members with no gaps in their claims:
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);
Output:
| MEMBERCODE |
--------------
| 1 |
Do note that you don't need to do COUNT(*) > 1
to detect members with 2 or more claims. Instead of using LEFT JOIN
, we uses JOIN
, this will automatically discard members who have yet to have a second claim. Here's the version(longer) if you opt to use LEFT JOIN
instead(same output as above):
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
and count(*) > 1; -- members who have two ore more claims only
Here's how see data of above query prior to filtering:
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select * from gaps;
Output:
| MEMBERCODE | STARTDATE | ENDDATE | NEXTSTARTDATE | GAP |
-----------------------------------------------------------------
| 1 | 2010-01-15 | 2010-01-20 | 2010-01-19 | -1 |
| 1 | 2010-01-19 | 2010-01-22 | 2010-01-20 | -2 |
| 1 | 2010-01-20 | 2010-01-25 | 2010-01-26 | 1 |
| 1 | 2010-01-26 | 2010-01-30 | (null) | (null) |
| 2 | 2010-01-20 | 2010-01-25 | 2010-01-30 | 5 |
| 2 | 2010-01-30 | 2010-02-05 | 2010-02-04 | -1 |
| 2 | 2010-02-04 | 2010-02-15 | (null) | (null) |
| 3 | 2010-02-15 | 2010-03-02 | (null) | (null) |
EDIT on requirement clarification:
On your clarification, you wanted to include members who have yet to have second claim too, do this instead: http://sqlfiddle.com/#!3/c3365/22
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
-- members who have yet to have a second claim are valid too
or count(nextstartdate) = 0;
Output:
| MEMBERCODE |
--------------
| 1 |
| 3 |
The technique is to count the member's nextstartdate
, if they have no next start date date(i.e. count(nextstartdate) = 0
) then they are single claims only and valid too, then just attach this OR
condition:
or count(nextstartdate) = 0;
Actually, the condition below will suffice too, I wanted to make the query more self-documenting though, hence I recommend counting on member's nextstartdate. Here's an alternative condition for counting members who have yet to have a second claim:
or count(*) = 1;
Btw, we also have to change the comparison from this:
sum(case when gap <= 1 then 1 end) = count(*)
to this(as we are using LEFT JOIN
now):
sum(case when gap <= 1 then 1 end) = count(gap)
Try this, it partitions rows by MemberCode
and gives them ordinal numbers. Then it compares rows with subsequent num
value, if difference between end date of a row and start date of a next row is greater than one day, it's an invalid member:
DECLARE @t TABLE (MemberCode VARCHAR(100), ClaimID
INT,StartDate DATETIME,EndDate DATETIME)
INSERT @t
VALUES
('00001' , 012345 , '2010-01-15' , '2010-01-20')
,('00001' , 012350 , '2010-01-19' , '2010-01-22')
,('00001' , 012352 , '2010-01-20' , '2010-01-25')
,('00001' , 012355 , '2010-01-26' , '2010-01-30')
,('00002' , 012357 , '2010-01-20' , '2010-01-25')
,('00002' , 012359 , '2010-01-30' , '2010-02-05')
,('00002' , 012360 , '2010-02-04' , '2010-02-15')
,('00003' , 012365 , '2010-02-15' , '2010-02-28')
,('00004' , 012366 , '2010-03-18' , '2010-03-23')
,('00005' , 012367 , '2010-03-19' , '2010-03-25')
,('00006' , 012368 , '2010-03-20' , '2010-03-21')
;WITH tbl AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY MemberCode ORDER BY StartDate)
AS num
FROM @t
), invalid AS (
SELECT tbl.MemberCode
FROM tbl
JOIN tbl _tbl ON
tbl.num = _tbl.num - 1
AND tbl.MemberCode = _tbl.MemberCode
WHERE DATEDIFF(DAY, tbl.EndDate, _tbl.StartDate) > 1
)
SELECT MemberCode
FROM tbl
EXCEPT
SELECT MemberCode
FROM invalid
I think your query gives back false negatives because it only checks the time interval between consecutive rows. In my opinion, it is possible that the gap is compensated by one of the previous lines. Let me give an example:
Row l: 2010-01-01 | 2010-01-31
Row 2: 2010-01-10 | 2010-01-15
Row 3: 2010-01-20 | 2010-01-25
Your code will report a gap between row 2 and row 3, while it is being filled by row 1. Your code will not detect this.
You should use the MAX(EndDate) of all previous rows in the DATEDIFF function.
DECLARE @t TABLE (PersonID VARCHAR(100), StartDate DATETIME, EndDate DATETIME)
INSERT @t VALUES('00001' , '2010-01-01' , '2010-01-17')
INSERT @t VALUES('00001' , '2010-01-19' , '2010-01-22')
INSERT @t VALUES('00001' , '2010-01-20' , '2010-01-25')
INSERT @t VALUES('00001' , '2010-01-26' , '2010-01-31')
INSERT @t VALUES('00002' , '2010-01-20' , '2010-01-25')
INSERT @t VALUES('00002' , '2010-02-04' , '2010-02-05')
INSERT @t VALUES('00002' , '2010-02-04' , '2010-02-15')
INSERT @t VALUES('00003' , '2010-02-15' , '2010-02-28')
INSERT @t VALUES('00004' , '2010-03-18' , '2010-03-23')
INSERT @t VALUES('00005' , '2010-03-19' , '2010-03-25')
INSERT @t VALUES('00006' , '2010-01-01' , '2010-04-20')
INSERT @t VALUES('00006' , '2010-01-20' , '2010-01-21')
INSERT @t VALUES('00006' , '2010-01-25' , '2010-01-26')
;WITH tbl AS (
SELECT
*, ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY StartDate) AS num
FROM @t
), invalid AS (
SELECT tbl.PersonID
FROM tbl
JOIN tbl _tbl ON
tbl.num = _tbl.num - 1 AND tbl.PersonID = _tbl.PersonID
WHERE DATEDIFF(DAY, (SELECT MAX(tbl3.EndDate) FROM tbl tbl3 WHERE tbl3.num <= tbl.num AND tbl3.PersonID = tbl.PersonID), _tbl.StartDate) > 1
)
SELECT PersonID
FROM tbl
EXCEPT
SELECT PersonID
FROM invalid