Can I do this in SQL function without a cursor?

2019-03-31 05:09发布

问题:

I'm working on a timesheet database. In simple terms, the TimesheetEntries table has four columns

ID int (identity, 1, 1)
StaffID int
ClockedIn datetime
ClockedOut datetime

I've been asked to write a report to show staff attendance by date range. The user puts in a date, and the report outputs the clocking in and out times of all attending staff members together with their duration on-site.

However, and this is where it gets tricky, staff members sometimes clock out to leave the site for short periods, and the report is required to ignore these (when they leave the site for less than 2 hours).

So, let's assume the following entries

ID  StaffID  ClockedIn    ClockedOut
1   4        0900         1200
2   4        1330         1730
3   5        0900         1200
4   5        1409         1730
5   4        1830         1930

The report's output SHOULD be

StaffID  ClockedIn    ClockedOut
4        0900         1930
5        0900         1200     
5        1409         1730  

Is there any way of doing this without a cursor or even a cursor nested inside a cursor (which is where I'm at right now!)? We're not talking about huge datasets here and performance isn't really an issue (it's a report, not a production system) but I really don't like cursors if I can avoid them.

Thanks

Edward

回答1:

I used the data from Jeremy's response above but went at the problem a whole different way. This uses a recursive CTE, which I think requires SQL Server 2005. It reports the results accurately (I believe) and also reports the number of clock-ins recorded during the timeframe and the total number of minutes off (can be more than 120 because the restriction is simply that each offsite period is less than two hours).

declare @TimeSheetEntries table 
    ( 
    ID int identity not null primary key, 
    StaffID int not null, 
    ClockedIn datetime not null, 
    ClockedOut datetime not null 
    ); 

insert into @TimeSheetEntries 
    ( 
    StaffID, 
    ClockedIn, 
    ClockedOut 
    ) 
select 
    4, 
    '2012-01-01 09:00:00', 
    '2012-01-01 12:00:00' 
union all select 
    4, 
    '2012-01-01 13:30:00', 
    '2012-01-01 17:30:00' 
union all select 
    5, 
    '2012-01-01 09:00:00', 
    '2012-01-01 12:00:00' 
union all select 
    5, 
    '2012-01-01 14:09:00', 
    '2012-01-01 17:30:00'
union all select
    4,
    '2012-01-01 18:30:00', 
    '2012-01-01 19:30:00';


WITH ClockData AS
(
    SELECT ID, StaffID, ClockedIn, ClockedOut AS EffectiveClockout, 1 AS NumClockIns, 0 AS MinutesOff
    FROM @TimeSheetEntries ts
    WHERE NOT EXISTS (SELECT ID FROM @TimeSheetEntries tsWhere WHERE tsWhere.ClockedOut BETWEEN DATEADD(hour, -2, ts.ClockedIn) AND ts.ClockedIn)

    UNION ALL

    SELECT cd.ID, cd.StaffID, cd.ClockedIn, ts.ClockedOut AS EffectiveClockout, cd.NumClockIns + 1 AS NumClockIns, cd.MinutesOff + DateDiff(minute, cd.EffectiveClockout, ts.ClockedIn) AS MinutesOff
    FROM @TimeSheetEntries ts
    INNER JOIN ClockData cd
        ON ts.StaffID = cd.StaffID
            AND ts.ClockedIn BETWEEN cd.EffectiveClockout AND dateadd(hour, 2, cd.EffectiveClockout)
)
SELECT *
FROM ClockData cd
WHERE NumClockIns = (SELECT MAX(NumClockIns) FROM ClockData WHERE ID = cd.ID)

This returns:

ID   StaffID   ClockedIn                 EffectiveClockout        NumClockIns   MinutesOff
3    5         2012-01-01 09:00:00.000   2012-01-01 12:00:00.000  1             0
4    5         2012-01-01 14:09:00.000   2012-01-01 17:30:00.000  1             0
1    4         2012-01-01 09:00:00.000   2012-01-01 19:30:00.000  3             150

UPDATE

In case it isn't clear, the MinutesOff is only the 'allowance' time, or the amount of time 'eaten' between the ClockedIn and EffectiveClockout shown in the same row. So, StaffID 5 took 129 minutes off between clocked time periods, but no allowance time, so MinutesOff is 0 for both rows.



回答2:

I'm sure there's less complicated ways to do this, but I was able to pull it off with a couple of CTEs:

declare @TimeSheetEntries table
    (
    ID int identity not null primary key,
    StaffID int not null,
    ClockedIn datetime not null,
    ClockedOut datetime not null
    );

insert into @TimeSheetEntries
    (
    StaffID,
    ClockedIn,
    ClockedOut
    )
select
    4,
    '2012-01-01 09:00:00',
    '2012-01-01 12:00:00'
union all select
    4,
    '2012-01-01 13:30:00',
    '2012-01-01 17:30:00'
union all select
    5,
    '2012-01-01 09:00:00',
    '2012-01-01 12:00:00'
union all select
    5,
    '2012-01-01 14:09:00',
    '2012-01-01 17:30:00'
union all select 
    4, 
    '2012-01-01 18:30:00', 
    '2012-01-01 19:30:00'       
;
with MultiCheckins as (
    select distinct
        StaffID,
        cast(cast(cast(ClockedIn as float) as int) as datetime) as TimeSheetDate,
        rank() over (
            partition by StaffID, 
            cast(cast(cast(ClockedIn as float) as int) as datetime)
            order by ClockedIn
            ) as ordinal,
        ClockedIn,
        ClockedOut
    from @TimeSheetEntries
), Organized as
(
select
    row_number() over (
        order by
            mc.StaffID,
            mc.TimeSheetDate,
            mc.ClockedIn,
            mc.ClockedOut
            ) as RowID,
    mc.StaffID,
    mc.TimeSheetDate,
    case
        when datediff(hour, coalesce(mc3.ClockedOut, mc.ClockedIn), mc.ClockedIn) >= 2
            then mc.ClockedIn 
        else coalesce(mc3.ClockedIn, mc.ClockedIn)
        end as ClockedIn,
    case 
        when datediff(hour, mc.ClockedOut, coalesce(mc2.ClockedIn, mc.ClockedOut)) < 2
            then coalesce(mc2.ClockedOut, mc.ClockedOut)
        else mc.ClockedOut
        end as ClockedOut
from
    MultiCheckins as mc
left outer join
    MultiCheckIns as mc3
        on mc3.StaffID = mc.StaffID
        and mc3.TimeSheetDate = mc.TimeSheetDate
        and mc3.ordinal =  mc.ordinal - 1
left outer join 
    MultiCheckIns as mc2
        on mc2.StaffID = mc.StaffID
        and mc2.TimeSheetDate = mc.TimeSheetDate
        and mc2.ordinal = mc.ordinal + 1
)
select distinct
    o.StaffID,
    o.ClockedIn,
    o.ClockedOut
from Organized as o
where
    not exists (
        select null from Organized as o2
        where o2.RowID <> o.RowID
        and o2.StaffID = o.StaffID
        and 
            (
            o.ClockedIn between o2.ClockedIn and o2.ClockedOut
            and o.ClockedOut between o2.ClockedIn and o2.ClockedOut
            )
        )


回答3:

Option 1 : Maybe insert it into a temp table and then use a left join to build a results table (if they can only clock in and out twice during the day this will work if you have 3 results it will not)

select *
from timesheet ts
left join timesheet tss on tss.id = ts.id

After this you can just get the min and max or even have a more robust report.

Option 2 :

create #TimeTable Table (UserID int, InTime int, OutTime int)

insert into #TimeTable (UserID) select distinct StaffID

Update #TimeTable set InTime = (select Min(InTime) from #TimeTable where StaffID = s.StaffID)  from #TimeTAble s

Update #TimeTable set OutTime = (Select Max(OutTime) from #TimeTable where StaffID = s.StaffID) from #TimeTable s

Given mroe time i'd merge these into two quick queries but three would work for not worrying about performance.



回答4:

An iterative set-based approach:

-- Sample data.
declare @TimesheetEntries as Table ( Id Int Identity, StaffId Int, ClockIn DateTime, ClockOut DateTime )
insert into @TimesheetEntries ( StaffId, ClockIn, ClockOut ) values
  ( 4, '2012-05-03 09:00', '2012-05-03 12:00' ),
  ( 4, '2012-05-03 13:30', '2012-05-03 17:30' ), -- This falls within 2 hours of the next two rows.
  ( 4, '2012-05-03 17:35', '2012-05-03 18:00' ),
  ( 4, '2012-05-03 19:00', '2012-05-03 19:30' ),
  ( 4, '2012-05-03 19:45', '2012-05-03 20:00' ),
  ( 5, '2012-05-03 09:00', '2012-05-03 12:00' ),
  ( 5, '2012-05-03 14:09', '2012-05-03 17:30' ),
  ( 6, '2012-05-03 09:00', '2012-05-03 12:00' ),
  ( 6, '2012-05-03 13:00', '2012-05-03 17:00' )
select Id, StaffId, ClockIn, ClockOut from @TimesheetEntries

-- Find all of the periods that need to be coalesced and start the process.
declare @Bar as Table ( Id Int Identity, StaffId Int, ClockIn DateTime, ClockOut DateTime )
insert into @Bar
  select TSl.StaffId, TSl.ClockIn, TSr.ClockOut
    from @TimesheetEntries as TSl inner join
      -- The same staff member and the end of the left period is within two hours of the start of the right period.
      @TimesheetEntries as TSr on TSr.StaffId = TSl.StaffId and DateDiff( ss, TSl.ClockOut, TSr.ClockIn ) between 0 and 7200

-- Continue coalescing periods until we run out of work.
declare @Changed as Bit = 1
while @Changed = 1
  begin
  set @Changed = 0
  -- Coalesce periods.
  update Bl
    -- Take the later   ClockOut   time from the two rows.
    set ClockOut = case when Br.ClockOut >= Bl.ClockOut then Br.ClockOut else Bl.ClockOut end
    from @Bar as Bl inner join
      @Bar as Br on Br.StaffId = Bl.StaffId and
        -- The left row started before the right and either the right period is completely contained in the left or the right period starts within two hours of the end of the left.
        Bl.ClockIn < Br.ClockIn and ( Br.ClockOut <= Bl.ClockOut or DateDiff( ss, Bl.ClockOut, Br.ClockIn ) < 7200 )
  if @@RowCount > 0
    set @Changed = 1
  -- Delete rows where one period is completely contained in another.
  delete Br
    from @Bar as Bl inner join
      @Bar as Br on Br.StaffId = Bl.StaffId and
        ( ( Bl.ClockIn < Br.ClockIn and Br.ClockOut <= Bl.ClockOut ) or ( Bl.ClockIn <= Br.ClockIn and Br.ClockOut < Bl.ClockOut ) )
  if @@RowCount > 0
    set @Changed = 1
  end

-- Return all of the coalesced periods ...
select StaffId, ClockIn, ClockOut, 'Coalesced Periods' as [Type]
  from @Bar
union all
-- ... and all of the independent periods.
select StaffId, ClockIn, ClockOut, 'Independent Period'
  from @TimesheetEntries as TS
  where not exists ( select 42 from @Bar where StaffId = TS.StaffId and ClockIn <= TS.ClockIn and TS.ClockOut <= ClockOut )
order by ClockIn, StaffId

I'm sure that there are some optimizations that ought to be made.



回答5:

I think you can do this pretty easily with just a left join back to itself and a one-off match. The following is not a full implementation, but more of a proof of concept:

create table #TimeSheetEntries 
    ( 
    ID int identity not null primary key, 
    StaffID int not null, 
    ClockedIn datetime not null, 
    ClockedOut datetime not null 
    ); 

insert into #TimeSheetEntries 
    ( 
    StaffID, 
    ClockedIn, 
    ClockedOut 
    ) 
select 
    4, 
    '2012-01-01 09:00:00', 
    '2012-01-01 12:00:00' 
union all select 
    4, 
    '2012-01-01 13:30:00', 
    '2012-01-01 17:30:00' 
union all select 
    5, 
    '2012-01-01 09:00:00', 
    '2012-01-01 12:00:00' 
union all select 
    5, 
    '2012-01-01 14:09:00', 
    '2012-01-01 17:30:00'
union all select
    4,
    '2012-01-01 18:30:00', 
    '2012-01-01 19:30:00'
union all select 4, '2012-01-01 18:30:00', '2012-01-01 19:30:00';


select * from #timesheetentries tse1
left outer join #timesheetentries tse2 on tse1.staffid = tse2.staffid 
  and tse2.id = 
  (
      select MAX(ID) 
      from #timesheetentries ts_max 
      where ts_max.id < tse1.id and tse1.staffid = ts_max.staffid
  )
  outer apply   
  (
  select DATEDIFF(minute, tse2.clockedout, tse1.clockedin) as BreakTime
  ) as breakCheck

where BreakTime > 120 or BreakTime < 0 or tse2.id is null

order by tse1.StaffID, tse1.ClockedIn


   GO
   drop table #timesheetentries
   GO

The idea here is that you have your original timesheet table tse1, and then you do a left join to the same timesheet table, aliased as tse2 and matching rows when staffID is the same and tse2.ID is the highest ID value that is still less than tse1.ID. This is clearly poor form - you'd probably want to use ROW_NUMBER() for this ID comparison, partitioned and ordered by StaffID and your ClockedIn/ClockedOut values, since times could have been input out of chronological order.

At this point, a row from the joined tables now contains time data from the current timesheet entry, as well as the one before it. This means that we can make a comparison between ClockedIn/ClockedOut values for consecutive time entries... and using DATEDIFF(), we can find out the length of time that a user has been away between their previous Clockedout and more recent ClockedIn values. I used an OUTER APPLY for this simply because it makes the code cleaner, but you could probably pack it into a subquery.

Once we do a DATEDIFF(), it's trivial to find cases where an individual's BreakTime does not exceed the 120-minute barrier and remove those timesheet entries, leaving only the significant rows of the employee's timesheet to be used in your later reporting.