我一直在下面,但没有得到结果和截止日期已经临近。 此外,有超过一百万行的下面。 感谢您对下面的帮助。
目标:小组成果会员,并建立连续覆盖,通过结合个别日期范围,其要么重叠或开始和范围的结束日之间没有休息连续跑互相范围的每个成员。
我有以下格式的数据:
MemberCode ----- ClaimID ----- StartDate ----- EndDate
00001 ----- 012345 ----- 2010-01-15 ----- 2010-01-20
00001 ----- 012350 ----- 2010-01-19 ----- 2010-01-22
00001 ----- 012352 ----- 2010-01-20 ----- 2010-01-25
00001 ----- 012355 ----- 2010-01-26 ----- 2010-01-30
00002 ----- 012357 ----- 2010-01-20 ----- 2010-01-25
00002 ----- 012359 ----- 2010-01-30 ----- 2010-02-05
00002 ----- 012360 ----- 2010-02-04 ----- 2010-02-15
00003 ----- 012365 ----- 2010-02-15 ----- 2010-02-30
...
在所述部件(00001)以上是有效的构件是有连续时间范围从2010-01-15到2010-01-30(没有间隙)。 请注意,此成员索赔编号012355立即开始权利要求ID 012352的结束日期旁边。 这仍然是有效的,因为它形成了一个连续范围。
然而,部件(00002)应该是一个无效的构件,因为这里的权利要求编号012357的结束日期和开始日期之间5天为权利要求ID 012359的间隙
我所试图做的就是只有那些谁拥有对连续日期范围内的每一天索赔(每个成员)的成员与MIN(开始日期)和Max(结束日期)之间没有间隙的每个列表不同的成员。 大家谁有差距被丢弃。
提前致谢。
更新:
我已经达到了,直到这里。 注: FILLED_DT = Start Date & PresCoverEndDT = End Date
SELECT PresCoverEndDT, FILLED_DT
FROM
(
SELECT DISTINCT FILLED_DT, ROW_NUMBER() OVER (ORDER BY FILLED_DT) RN
FROM Temp_Claims_PRIOR_STEP_5 T1
WHERE NOT EXISTS
(SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2
WHERE T1.FILLED_DT > T2.FILLED_DT AND T1.FILLED_DT< T2.PresCoverEndDT
AND T1.MBR_KEY = T2.MBR_KEY )
) T1
JOIN (SELECT DISTINCT PresCoverEndDT, ROW_NUMBER() OVER (ORDER BY PresCoverEndDT) RN
FROM Temp_Claims_PRIOR_STEP_5 T1
WHERE NOT EXISTS
(SELECT * FROM Temp_Claims_PRIOR_STEP_5 T2
WHERE T1.PresCoverEndDT > T2.FILLED_DT AND T1.PresCoverEndDT < T2.PresCoverEndDT AND T1.MBR_KEY = T2.MBR_KEY )
) T2
ON T1.RN - 1 = T2.RN
WHERE PresCoverEndDT < FILLED_DT
上面的代码似乎有错误,因为我只得到一个行,太多是不正确。 我的期望的输出只有1列如下:
Valid_Member_Code
00001
00007
00009
......等等,
试试这个: http://www.sqlfiddle.com/#!3/c3365/20
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);
看到这里查询进展: http://www.sqlfiddle.com/#!3/c3365/20
它是如何工作的,目前的结束日期比较其下一个开始日期和检查日期的差距:
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1;
输出:
| MEMBERCODE | STARTDATE | ENDDATE | NEXTSTARTDATE | GAP |
--------------------------------------------------------------
| 1 | 2010-01-15 | 2010-01-20 | 2010-01-19 | -1 |
| 1 | 2010-01-19 | 2010-01-22 | 2010-01-20 | -2 |
| 1 | 2010-01-20 | 2010-01-25 | 2010-01-26 | 1 |
| 2 | 2010-01-20 | 2010-01-25 | 2010-01-30 | 5 |
| 2 | 2010-01-30 | 2010-02-05 | 2010-02-04 | -1 |
然后检查是否有部件具有对于它的总的权利要求中没有间隙权利要求中的相同的计数:
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode, count(*) as count, sum(case when gap <= 1 then 1 end) as gapless_count
from gaps
group by membercode;
输出:
| MEMBERCODE | COUNT | GAPLESS_COUNT |
--------------------------------------
| 1 | 3 | 3 |
| 2 | 2 | 1 |
最后,它们进行过滤,并在他们的要求没有间隙的成员:
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(*);
输出:
| MEMBERCODE |
--------------
| 1 |
请注意,你不需要做COUNT(*) > 1
,检测成员有2次或更多的索赔。 而不是使用LEFT JOIN
,我们使用JOIN
,这将自动放弃会员谁尚未有第二个要求。 这里的版本(长),如果你选择使用LEFT JOIN
代替(如上相同的输出):
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
and count(*) > 1; -- members who have two ore more claims only
以下是如何过滤之前看到上述查询的数据:
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select * from gaps;
输出:
| MEMBERCODE | STARTDATE | ENDDATE | NEXTSTARTDATE | GAP |
-----------------------------------------------------------------
| 1 | 2010-01-15 | 2010-01-20 | 2010-01-19 | -1 |
| 1 | 2010-01-19 | 2010-01-22 | 2010-01-20 | -2 |
| 1 | 2010-01-20 | 2010-01-25 | 2010-01-26 | 1 |
| 1 | 2010-01-26 | 2010-01-30 | (null) | (null) |
| 2 | 2010-01-20 | 2010-01-25 | 2010-01-30 | 5 |
| 2 | 2010-01-30 | 2010-02-05 | 2010-02-04 | -1 |
| 2 | 2010-02-04 | 2010-02-15 | (null) | (null) |
| 3 | 2010-02-15 | 2010-03-02 | (null) | (null) |
编辑上要求澄清:
在您的澄清,你想包括成员谁尚未有第二个要求太多,而是执行此操作: http://sqlfiddle.com/#!3/c3365/22
with s as
(
select *, row_number() over(partition by membercode order by startdate) rn
from tbl
)
,gaps as
(
select a.membercode, a.startdate, a.enddate, b.startdate as nextstartdate
,datediff(d, a.enddate, b.startdate) as gap
from s a
left join s b on b.membercode = a.membercode and b.rn = a.rn + 1
)
select membercode
from gaps
group by membercode
having sum(case when gap <= 1 then 1 end) = count(gap)
-- members who have yet to have a second claim are valid too
or count(nextstartdate) = 0;
输出:
| MEMBERCODE |
--------------
| 1 |
| 3 |
该技术是统计会员的nextstartdate
,如果他们有没有下一个开始日期日期(即count(nextstartdate) = 0
),那么它们是单只索赔和有效过,然后就附上这个OR
条件:
or count(nextstartdate) = 0;
实际上,低于该条件就足够了太多,我想使查询更加自我说明虽然,所以我建议对成员的nextstartdate计数。 这里有一个计数构件谁尚未有第二个要求的替代条件:
or count(*) = 1;
顺便说一句,我们也必须改变从这个对比:
sum(case when gap <= 1 then 1 end) = count(*)
这个(因为我们使用的是LEFT JOIN
现在):
sum(case when gap <= 1 then 1 end) = count(gap)
试试这个,它通过划分行MemberCode
,并给他们的序数。 然后将其与随后的比较行num
值,如果行的结束日期,并开始下一行的日期之间的差值大于有一天,这是一个无效的成员:
DECLARE @t TABLE (MemberCode VARCHAR(100), ClaimID
INT,StartDate DATETIME,EndDate DATETIME)
INSERT @t
VALUES
('00001' , 012345 , '2010-01-15' , '2010-01-20')
,('00001' , 012350 , '2010-01-19' , '2010-01-22')
,('00001' , 012352 , '2010-01-20' , '2010-01-25')
,('00001' , 012355 , '2010-01-26' , '2010-01-30')
,('00002' , 012357 , '2010-01-20' , '2010-01-25')
,('00002' , 012359 , '2010-01-30' , '2010-02-05')
,('00002' , 012360 , '2010-02-04' , '2010-02-15')
,('00003' , 012365 , '2010-02-15' , '2010-02-28')
,('00004' , 012366 , '2010-03-18' , '2010-03-23')
,('00005' , 012367 , '2010-03-19' , '2010-03-25')
,('00006' , 012368 , '2010-03-20' , '2010-03-21')
;WITH tbl AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY MemberCode ORDER BY StartDate)
AS num
FROM @t
), invalid AS (
SELECT tbl.MemberCode
FROM tbl
JOIN tbl _tbl ON
tbl.num = _tbl.num - 1
AND tbl.MemberCode = _tbl.MemberCode
WHERE DATEDIFF(DAY, tbl.EndDate, _tbl.StartDate) > 1
)
SELECT MemberCode
FROM tbl
EXCEPT
SELECT MemberCode
FROM invalid
我觉得您的查询还给假阴性,因为它只检查连续的行之间的时间间隔。 在我看来,这是可能的间隙由之前的线的一个补偿。 让我举个例子吧:
列L:2010-01-01 | 2010-01-31
第2行:2010-01-10 | 2010-01-15
第3行:2010-01-20 | 2010-01-25
您的代码将报告行2和行3之间的间隙,同时它是由第1行中填写您的代码将无法检测到这一点。 您应该使用以前的所有行MAX(结束日期)在DATEDIFF函数。
DECLARE @t TABLE (PersonID VARCHAR(100), StartDate DATETIME, EndDate DATETIME)
INSERT @t VALUES('00001' , '2010-01-01' , '2010-01-17')
INSERT @t VALUES('00001' , '2010-01-19' , '2010-01-22')
INSERT @t VALUES('00001' , '2010-01-20' , '2010-01-25')
INSERT @t VALUES('00001' , '2010-01-26' , '2010-01-31')
INSERT @t VALUES('00002' , '2010-01-20' , '2010-01-25')
INSERT @t VALUES('00002' , '2010-02-04' , '2010-02-05')
INSERT @t VALUES('00002' , '2010-02-04' , '2010-02-15')
INSERT @t VALUES('00003' , '2010-02-15' , '2010-02-28')
INSERT @t VALUES('00004' , '2010-03-18' , '2010-03-23')
INSERT @t VALUES('00005' , '2010-03-19' , '2010-03-25')
INSERT @t VALUES('00006' , '2010-01-01' , '2010-04-20')
INSERT @t VALUES('00006' , '2010-01-20' , '2010-01-21')
INSERT @t VALUES('00006' , '2010-01-25' , '2010-01-26')
;WITH tbl AS (
SELECT
*, ROW_NUMBER() OVER (PARTITION BY PersonID ORDER BY StartDate) AS num
FROM @t
), invalid AS (
SELECT tbl.PersonID
FROM tbl
JOIN tbl _tbl ON
tbl.num = _tbl.num - 1 AND tbl.PersonID = _tbl.PersonID
WHERE DATEDIFF(DAY, (SELECT MAX(tbl3.EndDate) FROM tbl tbl3 WHERE tbl3.num <= tbl.num AND tbl3.PersonID = tbl.PersonID), _tbl.StartDate) > 1
)
SELECT PersonID
FROM tbl
EXCEPT
SELECT PersonID
FROM invalid
文章来源: How to identify the first gap in multiple start and end date ranges for each distinct member in T-SQL