Sum of minutes between multiple date ranges

2020-03-02 04:32发布

问题:

The scenario is users specify when they are available, these specified times can overlap each other. I'm trying to get the total time they are available for. Example with SQL Fiddle:

--Available--
ID  userID  availStart          availEnd
1   456     '2012-11-19 16:00'  '2012-11-19 17:00'
2   456     '2012-11-19 16:00'  '2012-11-19 16:50'
3   456     '2012-11-19 18:00'  '2012-11-19 18:30'
4   456     '2012-11-19 17:30'  '2012-11-19 18:10'
5   456     '2012-11-19 16:00'  '2012-11-19 17:10'
6   456     '2012-11-19 16:00'  '2012-11-19 16:50'

The output should be 130 minutes:

1: 60
2: 0 as falls inside 1
3: 30
4: 30 as the last 10 mins is covered by 3
5: 10 as first 60 mins is covered by 1
6: 0 as falls inside 1

I can get the total overlapping minutes, however this is more than the SUM of the available minutes:

SQL Fiddle

Any ideas how I can achieve this?

EDIT 21st Nov 12: Thanks so far for everyone's solutions - in a way I'm happy to see this wasn't an 'easy' query to write.

EDIT 23rd Nov 12: This is all great work. Internally here we're thinking it might be best to ensure users cannot enter overlapping times (eg forcing them to amend an existing entry)!

回答1:

Gordon Linoff has a CTE based answer

I've done some performance analysis on all the working algorithms Blank values mean it took too long. This is tested on a single Core i7 X920 @2GHz chip, backed by a couple of SSDs. The only index created was a cluster on UserID, AvailStart. If you think you can improve any of the performance, let me know.

This CTE version was worse than linear, SQL Server can't do the RN = RN + 1 join in an efficient way. I rectified this with a hybrid approach below, where I save and index the first CTE into a table variable. This still takes ten times as much IO as the cursor based approach.

With OrderedRanges as (
  Select
    Row_Number() Over (Partition By UserID Order By AvailStart) AS RN,
    AvailStart,
    AvailEnd
  From
    dbo.Available
  Where
    UserID = 456
),
AccumulateMinutes (RN, Accum, CurStart, CurEnd) as (
  Select
    RN, 0, AvailStart, AvailEnd
  From
    OrderedRanges
  Where 
    RN = 1
  Union All
  Select
    o.RN, 
    a.Accum + Case When o.AvailStart <= a.CurEnd Then
        0
      Else 
        DateDiff(Minute, a.CurStart, a.CurEnd)
      End,
    Case When o.AvailStart <= a.CurEnd Then 
        a.CurStart
      Else
        o.AvailStart
      End,
    Case When o.AvailStart <= a.CurEnd Then
        Case When a.CurEnd > o.AvailEnd Then a.CurEnd Else o.AvailEnd End
      Else
        o.AvailEnd
      End
  From
    AccumulateMinutes a
        Inner Join 
    OrderedRanges o On 
        a.RN = o.RN - 1
)

Select Max(Accum + datediff(Minute, CurStart, CurEnd)) From AccumulateMinutes 

http://sqlfiddle.com/#!6/ac021/2

After doing some performance analysis, here's a hybrid CTE/table variable version that performs better than anything except the cursor based approach

Create Function dbo.AvailMinutesHybrid(@UserID int) Returns Int As
Begin

Declare @UserRanges Table (
  RN int not null primary key, 
  AvailStart datetime, 
  AvailEnd datetime
)
Declare @Ret int = Null

;With OrderedRanges as (
  Select
    Row_Number() Over (Partition By UserID Order By AvailStart) AS RN,
    AvailStart,
    AvailEnd
  From
    dbo.Available
  Where
    UserID = @UserID
)
Insert Into @UserRanges Select * From OrderedRanges


;With AccumulateMinutes (RN,Accum, CurStart, CurEnd) as (
  Select
    RN, 0, AvailStart, AvailEnd
  From
    @UserRanges
  Where 
    RN = 1
  Union All
  Select
    o.RN, 
    a.Accum + Case When o.AvailStart <= a.CurEnd Then
        0
      Else 
        DateDiff(Minute, a.CurStart, a.CurEnd)
      End,
    Case When o.AvailStart <= a.CurEnd Then 
        a.CurStart
      Else
        o.AvailStart
      End,
    Case When o.AvailStart <= a.CurEnd Then
        Case When a.CurEnd > o.AvailEnd Then a.CurEnd Else o.AvailEnd End
      Else
        o.AvailEnd
      End
  From
    AccumulateMinutes a
        Inner Join 
    @UserRanges o On 
        a.RN + 1 = o.RN
)

Select 
  @Ret = Max(Accum + datediff(Minute, CurStart, CurEnd)) 
From 
  AccumulateMinutes 
Option
  (MaxRecursion 0)

Return @Ret

End

http://sqlfiddle.com/#!6/bfd94



回答2:

Here's another way of doing it with a cursor. I feel this techinique should be adaptable to a CTE, but I can't figure out how to do it

The method is to arrange each range by start time Then we build a range that coalesces ranges in order, until we find a range that doesn't overlap our coalesced range. We then calculate how many minutes are in the coalesced range, and remember this We carry on with the next ranges, again coalesing any that overlap. We accumulate minutes each time we get a non overlapping start point At the end we add the accumulated minutes onto the length of the last range

It's fairly easy to see that because of the order, once a range is distinct from what's gone before then no further ranges could overlap what's gone before, as their start dates are all greater.

Declare
  @UserID int = 456,
  @CurStart datetime, -- our current coalesced range start
  @CurEnd datetime, -- our current coalsced range end
  @AvailStart datetime, -- start or range for our next row of data
  @AvailEnd datetime, -- end of range for our next row of data
  @AccumMinutes int = 0 -- how many minutes so far accumulated by distinct ranges

Declare MinCursor Cursor Fast_Forward For
Select
  AvailStart, AvailEnd
From
  dbo.Available
Where
  UserID = @UserID
Order By
  AvailStart

Open MinCursor

Fetch Next From MinCursor Into @AvailStart, @AvailEnd
Set @CurStart = @AvailStart
Set @CurEnd = @AvailEnd

While @@Fetch_Status = 0
Begin
  If @AvailStart <= @CurEnd -- Ranges Overlap, so coalesce and continue
    Begin
    If @AvailEnd > @CurEnd 
      Set @CurEnd = @AvailEnd
    End
  Else -- Distinct range, coalesce minutes from previous range
  Begin
    Set @AccumMinutes = @AccumMinutes + DateDiff(Minute, @CurStart, @CurEnd)
    Set @CurStart = @AvailStart -- Start coalescing a new range
    Set @CurEnd = @AvailEnd
  End
  Fetch Next From MinCursor Into @AvailStart, @AvailEnd
End

Select @AccumMinutes + DateDiff(Minute, @CurStart, @CurEnd) As TotalMinutes

Close MinCursor
Deallocate MinCursor;

http://sqlfiddle.com/#!6/3483c/15



回答3:

The main problem is that you can have chains of overlapping entries, so you need to combine an indefinite amount of times to remove all the overlap - this is more suited to a procedural method than SQL. But if you would prefer to not use temporary tables, here's a CTE method - keep in mind that CTEs can only recurse a given number of times, so if you have any particularly long chains, it will fail.

WITH MergedAvailable
AS
(
  SELECT Available.UserID, Available.AvailStart, MAX(Available.AvailEnd) AS AvailEnd
    FROM Available
   WHERE (
           SELECT COUNT(*)
             FROM Available AS InnerAvailable
            WHERE InnerAvailable.AvailStart < Available.AvailStart
                  AND
                  InnerAvailable.AvailEnd >= Available.AvailStart
         ) = 0
   GROUP BY Available.UserID, Available.AvailStart
  UNION ALL
  SELECT MergedAvailable.UserID, MergedAvailable.AvailStart,
         LongestExtensionToAvailableInterval.NewIntervalEnd
    FROM MergedAvailable
   CROSS APPLY GetLongestExtensionToAvailableInterval(MergedAvailable.UserID,
               MergedAvailable.AvailStart,
               MergedAvailable.AvailEnd) AS LongestExtensionToAvailableInterval
   WHERE LongestExtensionToAvailableInterval.NewIntervalEnd IS NOT NULL
)

SELECT SUM(DATEDIFF(MINUTE,
                    FinalAvailable.AvailStart,
                    FinalAvailable.AvailEnd)) AS MinsAvailable
  FROM (
         SELECT MergedAvailable.UserID, MergedAvailable.AvailStart,
                MAX(MergedAvailable.AvailEnd) AS AvailEnd
           FROM MergedAvailable
          GROUP BY MergedAvailable.UserID, MergedAvailable.AvailStart
       ) AS FinalAvailable

This table function is required:

CREATE FUNCTION GetLongestExtensionToAvailableInterval
(
  @UserID int,
  @CurrentIntervalStart datetime,
  @CurrentIntervalEnd datetime
)
RETURNS TABLE
AS
RETURN 
  SELECT MAX(Available.AvailEnd) AS NewIntervalEnd
    FROM Available
   WHERE Available.UserID = @UserID
         AND
         Available.AvailStart > @CurrentIntervalStart
         AND
         Available.AvailStart <= @CurrentIntervalEnd
         AND
         Available.AvailEnd > @CurrentIntervalEnd

The general idea is that it starts from all ranges where the start of the range isn't overlapping anything, and then with every recursion it extends the current range to the furthest extent of the currently overlapping ranges. The table function is needed to determine the furthest extent, as recursing sections of CTEs are not allowed to included plain aggregates.

With the data you've provided, the starting rows are:

456 2012-11-19 16:00 2012-11-19 17:10
456 2012-11-19 17:30 2012-11-19 18:10

The only row which ends up being added via the recursion is:

456 2012-11-19 17:30 2012-11-19 18:30

For the sake of the example, say you had a row with ID 7 which went from 18:20 to 19:20. Then there would be a second recursion which brought back the row:

456 2012-11-19 17:30 2012-11-19 19:20

So while the query will get to the start and end of each overlapping range, it will also be bringing back all the intermediate stages. This is why we need to take the aggregate maximum end date for each start date after the CTE, to remove them.



回答4:

Condition t1.availStart > t2.availEnd OR t1.availEnd < t2.availStart check period which will never be crossed. If it is crossed then minimum availStart or maximum availEnd esle availStart or availEnd.

Probably more than one crossing period.
In your case it 
16:00:00 - 17:10:00 includes the ranges:16:00:00 - 16:50:00,
                                        16:00:00 - 16:50:00,
                                        16:00:00 - 17:00:00,
                                        16:00:00 - 17:10:00
17:30:00 - 18:30:00 includes the ranges:17:30:00 - 18:10:00,
                                        18:00:00 - 18:30:00

UPDATE 21.11.2012; 30.11.2012; 04.01.2013

CREATE FUNCTION dbo.Overlap
 (
  @availStart datetime,
  @availEnd datetime,
  @availStart2 datetime,
  @availEnd2 datetime
  )
RETURNS TABLE
RETURN
  SELECT CASE WHEN @availStart >= @availEnd2 OR @availEnd <= @availStart2
              THEN @availStart ELSE
                               CASE WHEN @availStart > @availStart2 THEN @availStart2 ELSE @availStart END
                               END AS availStart,
         CASE WHEN @availStart >= @availEnd2 OR @availEnd <= @availStart2
              THEN @availEnd ELSE
                             CASE WHEN @availEnd > @availEnd2 THEN @availEnd ELSE @availEnd2 END
                             END AS availEnd

;WITH cte AS
 (
  SELECT userID, availStart, availEnd, ROW_NUMBER() OVER (PARTITION BY UserID ORDER BY AvailStart) AS Id
  FROM dbo.test53
  ), cte2 AS
 (
  SELECT Id, availStart, availEnd
  FROM cte
  WHERE Id = 1
  UNION ALL
  SELECT c.Id, o.availStart, o.availEnd
  FROM cte c JOIN cte2 ct ON c.Id = ct.Id + 1
             CROSS APPLY dbo.Overlap(c.availStart, c.availEnd, ct.availStart, ct.availEnd) AS o
  )
  SELECT TOP 1 SUM(DATEDIFF(minute, availStart, MAX(availEnd))) OVER()
  FROM cte2
  GROUP BY availStart

Demo on SQLFiddle



回答5:

Create Table #Available (
  ID int not null primary key,
  UserID int not null,
  AvailStart datetime not null,
  AvailEnd datetime not null
)


Insert Into #Available (ID,UserID, AvailStart, AvailEnd) Values
  (1,456, '2012-11-19 16:00', '2012-11-19 17:00'),
  (2,456, '2012-11-19 16:00', '2012-11-19 16:50'),
  (3,456, '2012-11-19 18:00', '2012-11-19 18:30'),
  (4,456, '2012-11-19 17:30', '2012-11-19 18:10'),
  (5,456, '2012-11-19 16:00', '2012-11-19 17:10'),
  (6,456, '2012-11-19 16:00', '2012-11-19 16:50'),
  (7,457, '2012-11-19 16:00', '2012-11-19 17:10'),
  (8,457, '2012-11-19 16:00', '2012-11-19 16:50');  
Select Distinct UserID 
into #users
from #Available


Create Table #mins(UserID int,atime datetime,aset tinyint )
Declare @start Datetime
Declare @end Datetime

Select @start=min(AvailStart),@end=max(AvailEnd) from #Available 
While @start<@end
    begin
     insert into #mins(UserID,atime) 
     Select UserID ,@Start from #users
     Select @start=DateAdd(mi,1,@start)
    end

update #mins set aset=1
from #Available
where atime>=AvailStart and atime<Availend and #mins.UserID = #Available.UserID


select UserID,SUM(aset) as [Minutes] 
from #mins
Group by UserID 
Drop table #Available
Drop table #mins
Drop table #users