I searched night and day back when I was first starting out in the sql world for an answer to this question. Could not find anything similar to this for my needs so I decided to ask and answer my own question in case others need help like I did.
Here is an example of the data I have. For simplicity, it is all from the Job table. Each JobID has it's own Start and End time that are basically random and can overlap, have gaps, start and end at the same time as other jobs etc.
--Available--
JobID WorkerID JobStart JobEnd
1 25 '2012-11-17 16:00' '2012-11-17 17:00'
2 25 '2012-11-18 16:00' '2012-11-18 16:50'
3 25 '2012-11-19 18:00' '2012-11-19 18:30'
4 25 '2012-11-19 17:30' '2012-11-19 18:10'
5 26 '2012-11-18 16:00' '2012-11-18 17:10'
6 26 '2012-11-19 16:00' '2012-11-19 16:50'
What I wanted the result of the query to show would be:
WorkerID TotalTime(in Mins)
25 170
26 120
EDIT: Forgot to mention that the overlaps need to be ignored. Basically this is supposed to treat these workers and their jobs like you would an hourly employee and not a contractor. Like if I worked two jobIDs and started and finished them both from 12:00pm to 12:30pm, as an employee I would only get paid for 30 mins, whereas a contractor would likely get paid 60 mins, since their jobs are treated individually and get paid per job. The point of this query is to analyze jobs in a database that are tied to a worker, and need to find out if that worker was treated as an employee, what would his total hours worked in a given set of time come out to be.
EDIT2: won't let me answer my own question for 7 hours, will move it there later.
Ok, Answering Question now. Basically, I use temp table to build each minute between the min and max datetime of the jobs I am looking up.
IF OBJECT_ID('tempdb..#time') IS NOT NULL
BEGIN
drop table #time
END
DECLARE @FromDate AS DATETIME,
@ToDate AS DATETIME,
@Current AS DATETIME
SET @FromDate = '2012-11-17 16:00'
SET @ToDate = '2012-11-19 18:30'
create table #time (cte_start_date datetime)
set @current = @FromDate
while (@current < @ToDate)
begin
insert into #time (cte_start_date)
values (@current)
set @current = DATEADD(n, 1, @current)
end
Now I have all the mins in a temp table. Now I need to join all the Job table info into it and select out what I need in one go.
SELECT J.WorkerID
,COUNT(DISTINCT t.cte_start_date) AS TotalTime
FROM #time AS t
INNER JOIN Job AS J ON t.cte_start_date >= J.JobStart AND t.cte_start_date < J.JobEnd --Thanks ErikE
GROUP BY J.WorkerID --Thanks Martin Parkin
drop table #time
That is the very simplified answer and is good to get someone started.
A query such as the following should provide the answer you are looking for:
Apologies that it is untested (I have no SQL Server to test it here) but it should do the trick.
This is a complicated query. Explanation follows.
The key to understanding this approach is to understand the "overlap" logic. One time period overlaps with the next when the next start time is before the previous end time. By assigning an overlap flag to each record, we know if it overlaps with the "next" record. The above logic is using the start time for this. It might be better to use the JobId, especially if two jobs for the same worker could start at the same time.
The calculation of the overlap flag uses a correlated subquery (this is
j
in thewith
clause).Then, for each record we go back and find the first record afterwards where the
overlap
value is NULL. This provides a grouping key for all records in a given overlap set.The rest, then, is just to aggregate the results, first at the
workerId
/group level and then at theworkerId
level to get the final results.I have not run this SQL, so it might have syntax errors.
This query does the job as well. Its performance is very good (while the execution plan looks not so great, the actual CPU and IO beat many other queries).
See it working in a Sql Fiddle.
With a clustered index on
WorkerID, JobStart, JobEnd, JobID
, and with the sample 7 rows from the above fiddle a template for new worker/job data repeated enough times to yield a table with 14,336 rows, here are the performance results. I've included the other working/correct answers on the page (so far):I did a more exhaustive test from a different (slower) server (where each query was run 25 times, the best and worst values for each metric were thrown out, and the remaining 23 values were averaged) and got the following:
The alternate technique I thought to be sure to improve things. Well, it saved 6 reads, but cost a lot more CPU (which makes sense). Instead of carrying through the start/end statistics of each timeslice to the end, it is best just recalculating which slices to keep with the
EXISTS
against the original data. It may be that a different profile of few workers with many jobs could change the performance statistics for different queries.In case anyone wants to try it, use the
CREATE TABLE
andINSERT
statements from my fiddle and then run this 11 times:I built two other solutions to this query but the best one with about double the performance had a fatal flaw (not correctly handling fully enclosed time ranges). The other had very high/bad statistics (which I knew but had to try).
Explanation
Using all the endpoint times from each row, build up a distinct list of all possible time ranges of interest by duplicating each endpoint time and then grouping in such a way as to pair each time with the next possible time. Sum the elapsed minutes of these ranges wherever they coincide with any actual worker's working time.