Merge adjacent rows in SQL?

2019-01-25 18:00发布

问题:

I'm doing some reporting based on the blocks of time employees work. In some cases, the data contains two separate records for what really is a single block of time.

Here's a basic version of the table and some sample records:

EmployeeID
StartTime
EndTime

Data:

EmpID      Start         End
----------------------------
#1001   10:00 AM    12:00 PM
#1001    4:00 PM     5:30 PM
#1001    5:30 PM     8:00 PM

In the example, the last two records are contiguous in time. I'd like to write a query that combines any adjacent records so the result set is this:

EmpID      Start         End
----------------------------
#1001   10:00 AM    12:00 PM
#1001    4:00 PM     8:00 PM

Ideally, it should also be able to handle more than 2 adjacent records, but that is not required.

回答1:

This article provides quite a few possible solutions to your question

http://www.sqlmag.com/blog/puzzled-by-t-sql-blog-15/tsql/solutions-to-packing-date-and-time-intervals-puzzle-136851

This one seems like the most straight forward:

WITH StartTimes AS
(
  SELECT DISTINCT username, starttime
  FROM dbo.Sessions AS S1
  WHERE NOT EXISTS
    (SELECT * FROM dbo.Sessions AS S2
     WHERE S2.username = S1.username
       AND S2.starttime < S1.starttime
       AND S2.endtime >= S1.starttime)
),
EndTimes AS
(
  SELECT DISTINCT username, endtime
  FROM dbo.Sessions AS S1
  WHERE NOT EXISTS
    (SELECT * FROM dbo.Sessions AS S2
     WHERE S2.username = S1.username
       AND S2.endtime > S1.endtime
       AND S2.starttime <= S1.endtime)
)
SELECT username, starttime,
  (SELECT MIN(endtime) FROM EndTimes AS E
   WHERE E.username = S.username
     AND endtime >= starttime) AS endtime
FROM StartTimes AS S;


回答2:

If this is strictly about adjacent rows (not overlapping ones), you could try the following method:

  1. Unpivot the timestamps.

  2. Leave only those that have no duplicates.

  3. Pivot the remaining ones back, coupling every Start with the directly following End.

Or, in Transact-SQL, something like this:

WITH unpivoted AS (
  SELECT
    EmpID,
    event,
    dtime,
    count = COUNT(*) OVER (PARTITION BY EmpID, dtime)
  FROM atable
  UNPIVOT (
    dtime FOR event IN (StartTime, EndTime)
  ) u
)
, filtered AS (
  SELECT
    EmpID,
    event,
    dtime,
    rowno = ROW_NUMBER() OVER (PARTITION BY EmpID, event ORDER BY dtime)
  FROM unpivoted
  WHERE count = 1
)
, pivoted AS (
  SELECT
    EmpID,
    StartTime,
    EndTime
  FROM filtered
  PIVOT (
    MAX(dtime) FOR event IN (StartTime, EndTime)
  ) p
)
SELECT *
FROM pivoted
;

There's a demo for this query at SQL Fiddle.



回答3:

I have changed a lil' bit the names and types to make the example smaller but this works and should be very fast and it has no number of records limit:

with cte as (
  select 
    x1.id
    ,x1.t1
    ,x1.t2
    ,case when x2.t1 is null then 1 else 0 end as bef
    ,case when x3.t1 is null then 1 else 0 end as aft
  from x x1
  left join x x2 on x1.id=x2.id and x1.t1=x2.t2
  left join x x3 on x1.id=x3.id and x1.t2=x3.t1
  where x2.id is null
  or    x3.id is null
)

select 
  cteo.id
  ,cteo.t1
  ,isnull(z.t2,cteo.t2) as t2

from cte cteo
outer apply (select top 1 * 
             from cte ctei 
             where cteo.id=ctei.id and cteo.aft=0 and ctei.t1>cteo.t1
             order by t1) z
where cteo.bef=1

and the fiddle for it : http://sqlfiddle.com/#!3/ad737/12/0



回答4:

Option with Inline User-Defined Function AND CTE

CREATE FUNCTION dbo.Overlap
 (
  @availStart datetime,
  @availEnd datetime,
  @availStart2 datetime,
  @availEnd2 datetime
  )
RETURNS TABLE
RETURN
  SELECT CASE WHEN @availStart > @availEnd2 OR @availEnd < @availStart2
              THEN @availStart ELSE
                               CASE WHEN @availStart > @availStart2 THEN @availStart2 ELSE @availStart END
                               END AS availStart,
         CASE WHEN @availStart > @availEnd2 OR @availEnd < @availStart2
              THEN @availEnd ELSE
                             CASE WHEN @availEnd > @availEnd2 THEN @availEnd ELSE @availEnd2 END
                             END AS availEnd

;WITH cte AS
 (
  SELECT EmpID, Start, [End], ROW_NUMBER() OVER (PARTITION BY EmpID ORDER BY Start) AS Id
  FROM dbo.TableName
  ), cte2 AS
 (
  SELECT Id, EmpID, Start, [End]
  FROM cte
  WHERE Id = 1
  UNION ALL
  SELECT c.Id, c.EmpID, o.availStart, o.availEnd
  FROM cte c JOIN cte2 ct ON c.Id = ct.Id + 1
             CROSS APPLY dbo.Overlap(c.Start, c.[End], ct.Start, ct.[End]) AS o
  )
  SELECT EmpID, Start, MAX([End])
  FROM cte2
  GROUP BY EmpID, Start

Demo on SQLFiddle



回答5:

CTE with cumulative sum:

DECLARE @t TABLE(EmpId INT, Start TIME, Finish TIME)
INSERT INTO @t (EmpId, Start, Finish)
VALUES
    (1001, '10:00 AM', '12:00 PM'),
    (1001, '4:00 PM', '5:30 PM'),
    (1001, '5:30 PM', '8:00 PM')

;WITH rowind AS (
    SELECT EmpId, Start, Finish,
        -- IIF returns 1 for each row that should generate a new row in the final result
        IIF(Start = LAG(Finish, 1) OVER(PARTITION BY EmpId ORDER BY Start), 0, 1) newrow
    FROM @t),
    groups AS (
    SELECT EmpId, Start, Finish,
        -- Cumulative sum
        SUM(newrow) OVER(PARTITION BY EmpId ORDER BY Start) csum
    FROM rowind)

SELECT
    EmpId,
    MIN(Start) Start,
    MAX(Finish) Finish
FROM groups
GROUP BY EmpId, csum