岛屿和差距TSQL(islands and gaps tsql)

2019-09-21 06:14发布

我一直在挣扎,应该是其实很简单的一个问题,但读了整整一个星期后,谷歌搜索,实验等等,我的同事,我们不能找到合适的解决方案。 :(

问题:我们有一个具有两个值的表:一个employeenumber(P_ID,INT)<---雇员的识别的日期(开始时间,日期时间)<---时间雇员检查在

  • 我们需要知道每个员工已经工作了什么阶段。
  • 当两个日期都小于@gap天外,他们属于同一时期
  • 对于每个员工可以有多个记录任何一天,但我只需要知道其历史可他的工作,我不感兴趣的部分时间
  • 只要是有差距> @gap天,接下来的日期被认为是新范围的开始
  • 范围是至少1天(例如:21-9-2011 | 2011-09-21),但不具有最大长度。 (一个员工在每一个@gap检查 - 1天应导致从他检查了直到今天第一天期)

我们认为我们需要在这个表格其中天的间隔比@variable较大的岛屿是什么(@gap = 30表示30天)

所以一个例子:

sourceTable会:

P_ID  | starttime
------|------------------
12121 | 24-03-2009 7:30
12121 | 24-03-2009 14:25 
12345 | 27-06-2011 10:00
99999 | 01-05-2012 4:50 
12345 | 27-06-2011 10:30
12345 | 28-06-2011 11:00
98765 | 13-04-2012 10:00
12345 | 21-07-2011 9:00
99999 | 03-05-2012 23:15
12345 | 21-09-2011 12:00
45454 | 12-07-2010 8:00
12345 | 21-09-2011 17:00
99999 | 06-05-2012 11:05
99999 | 20-05-2012 12:45
98765 | 26-04-2012 16:00
12345 | 07-07-2012 14:00
99999 | 01-06-2012 13:55
12345 | 13-08-2012 13:00

现在我需要的结果是:

时期:

P_ID  |   Start    |    End
-------------------------------
12121 | 24-03-2009 | 24-03-2009
12345 | 27-06-2012 | 21-07-2012
12345 | 21-09-2012 | 21-09-2012
12345 | 07-07-2012 | (today) OR 13-08-2012  <-- (less than @gap days ago) OR (last date in table)
45454 | 12-07-2010 | 12-07-2010
45454 | 17-06-2012 | 17-06-2012 
98765 | 13-04-2012 | 26-04-2012
99999 | 01-05-2012 | 01-06-2012

我希望这是明确的这种方式,我已经感谢您阅读为止这一点,那将是巨大的,如果你能做出贡献!

Answer 1:

我已经做了应该让你开始一个粗糙的脚本。 没有打扰炼日期时间和端点的比较可能需要调整。

select 
    P_ID,
    src.starttime,
    endtime = case when src.starttime <> lst.starttime or lst.starttime < DATEADD(dd,-1 * @gap,GETDATE()) then lst.starttime else GETDATE() end,
    frst.starttime,
    lst.starttime
from @SOURCETABLE src
outer apply (select starttime = MIN(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > DATEADD(dd,-1 * @gap,src.starttime)) frst
outer apply (select starttime = MAX(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and src.starttime > DATEADD(dd,-1 * @gap,sub.starttime)) lst
where src.starttime = frst.starttime
order by P_ID, src.starttime

我得到下面的输出,这是一个利特尔不同的你的,但我认为它的确定:

P_ID        starttime               endtime                 starttime               starttime
----------- ----------------------- ----------------------- ----------------------- -----------------------
12121       2009-03-24 07:30:00.000 2009-03-24 14:25:00.000 2009-03-24 07:30:00.000 2009-03-24 14:25:00.000
12345       2011-06-27 10:00:00.000 2011-07-21 09:00:00.000 2011-06-27 10:00:00.000 2011-07-21 09:00:00.000
12345       2011-09-21 12:00:00.000 2011-09-21 17:00:00.000 2011-09-21 12:00:00.000 2011-09-21 17:00:00.000
12345       2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000 2012-07-07 14:00:00.000
12345       2012-08-13 13:00:00.000 2012-08-16 11:23:25.787 2012-08-13 13:00:00.000 2012-08-13 13:00:00.000
45454       2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000 2010-07-12 08:00:00.000
98765       2012-04-13 10:00:00.000 2012-04-26 16:00:00.000 2012-04-13 10:00:00.000 2012-04-26 16:00:00.000

最后两个输出的cols是结果outer apply的部分,并且就在那里进行调试。

这是基于以下设置:

declare @gap int
set @gap = 30

set dateformat dmy
-----P_ID----|----starttime----
declare @SOURCETABLE table (P_ID int, starttime datetime)
insert @SourceTable values 
(12121,'24-03-2009 7:30'),
(12121,'24-03-2009 14:25'),
(12345,'27-06-2011 10:00'),
(12345,'27-06-2011 10:30'),
(12345,'28-06-2011 11:00'),
(98765,'13-04-2012 10:00'),
(12345,'21-07-2011 9:00'),
(12345,'21-09-2011 12:00'),
(45454,'12-07-2010 8:00'),
(12345,'21-09-2011 17:00'),
(98765,'26-04-2012 16:00'),
(12345,'07-07-2012 14:00'),
(12345,'13-08-2012 13:00')

UPDATE:轻微的反思。 现在使用CTE从每个项目向前和向后工作存在的差距,然后汇总这些:

--Get the gap between each starttime and the next and prev (use 999 to indicate non-closed intervals)
;WITH CTE_Gaps As ( 
    select
        p_id,
        src.starttime,
        nextgap = coalesce(DATEDIFF(dd,src.starttime,nxt.starttime),999), --Gap to the next entry
        prevgap = coalesce(DATEDIFF(dd,prv.starttime,src.starttime),999), --Gap to the previous entry
        isold = case when DATEDIFF(dd,src.starttime,getdate()) > @gap then 1 else 0 end --Is starttime more than gap days ago?
    from
        @SOURCETABLE src
        cross apply (select starttime = MIN(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime > src.starttime) nxt
        cross apply (select starttime = max(starttime) from @SOURCETABLE sub where src.p_id = sub.p_id and sub.starttime < src.starttime) prv   
)
--select * from CTE_Gaps
select
        p_id,
        starttime = min(gap.starttime),
        endtime = nxt.starttime
    from
        CTE_Gaps gap
        --Find the next starttime where its gap to the next > @gap
        cross apply (select starttime = MIN(sub.starttime) from CTE_Gaps sub where gap.p_id = sub.p_id and sub.starttime >= gap.starttime and sub.nextgap > @gap) nxt
group by P_ID, nxt.starttime
order by P_ID, nxt.starttime


Answer 2:

乔恩最明确地告诉我们正确的方向。 性能是可怕的,虽然(在数据库中的400万+记录)。 它看起来像我们缺少一些信息。 有了这些,我们从你身上学到我们想出了下面的解决方案。 它使用经过3个temptables所有建议的答案元素和周期之前的最后喷涌的结果,但性能足够好,以及其生成的数据。

declare @gap int
declare @Employee_id int

set @gap = 30   
set dateformat dmy
--------------------------------------------------------------- #temp1 --------------------------------------------------
CREATE TABLE #temp1 ( EmployeeID int, starttime date)
INSERT INTO #temp1 ( EmployeeID, starttime)

select distinct ck.Employee_id, 
                cast(ck.starttime as date)
from SERVER1.DB1.dbo.checkins pd
        inner join SERVER1.DB1.dbo.Team t on ck.team_id = t.id
where t.productive = 1

--------------------------------------------------------------- #temp2 --------------------------------------------------

create table #temp2 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, FIRSTCHECKIN datetime)
INSERT INTO #temp2 

select Row_number() OVER (partition by EmployeeID ORDER BY t.prev) + 1 as ROWNR,
             EmployeeID,
             DATEADD(DAY, 1, t.Prev) AS start_gap,
           DATEADD(DAY, 0, t.next) AS end_gap
from 
             (
                    select a.EmployeeID,
                                  a.starttime as Prev, 
                                  (
                                  select min(b.starttime)
                                  from #temp1 as b
                                  where starttime > a.starttime and b.EmployeeID = a.EmployeeID 
                                  ) as Next
from #temp1 as a) as t

where  datediff(day, prev, next ) > 30
group by     EmployeeID,
                    t.Prev,
                    t.next
union -- add first known date for Employee 

select      1 as ROWNR,
            EmployeeID,
            NULL,
            min(starttime)
from #temp1 ct
group by ct.EmployeeID

--------------------------------------------------------------- #temp3 --------------------------------------------------

create table #temp3 (ROWNR int, Employeeid int, ENDOFCHECKIN datetime, STARTOFCHECKIN datetime)
INSERT INTO #temp3

select  ROWNR,
        Employeeid,
        ENDOFCHECKIN,
        FIRSTCHECKIN
from #temp2 

union -- add last known date for Employee 

select       (select count(*) from #temp2 b where Employeeid = ct.Employeeid)+1 as ROWNR,
             ct.Employeeid,
            (select dateadd(d,1,max(starttime)) from #temp1 c where Employeeid = ct.Employeeid),
             NULL
from #temp2 ct
group by ct.EmployeeID

---------------------------------------finally check our data-------------------------------------------------


select              a1.Employeeid,
                    a1.STARTOFCHECKIN as STARTOFCHECKIN,
                    ENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN a1.ENDOFCHECKIN ELSE b1.ENDOFCHECKIN END,
                    year(a1.STARTOFCHECKIN) as JaarSTARTOFCHECKIN,
                    JaarENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN  year(a1.ENDOFCHECKIN) ELSE  year(b1.ENDOFCHECKIN) END,
                    Month(a1.STARTOFCHECKIN) as MaandSTARTOFCHECKIN,
                    MaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN  month(a1.ENDOFCHECKIN) ELSE  month(b1.ENDOFCHECKIN) END,
                    (year(a1.STARTOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) as JaarMaandSTARTOFCHECKIN,
                    JaarMaandENDOFCHECKIN = CASE WHEN b1.ENDOFCHECKIN <= a1.STARTOFCHECKIN THEN (year(a1.ENDOFCHECKIN)*100)+month(a1.STARTOFCHECKIN) ELSE (year(b1.ENDOFCHECKIN)*100)+month(b1.ENDOFCHECKIN) END,
                    datediff(M,a1.STARTOFCHECKIN,b1.ENDOFCHECKIN) as MONTHSCHECKEDIN
from #temp3 a1
       full outer join #temp3 b1 on a1.ROWNR = b1.ROWNR -1 and a1.Employeeid = b1.Employeeid
where not (a1.STARTOFCHECKIN is null AND b1.ENDOFCHECKIN is null) 
order by a1.Employeeid, a1.STARTOFCHECKIN


文章来源: islands and gaps tsql