I have data in the format:
Date, FirstName, LastName, Unit
Jan1 , Bob , Guy , Home
Jan2 , Bob , Guy , Home
Jan3 , Bob , Guy , Home
Jan5 , Bob , Guy , Home
Jan6 , Bob , Guy , Home
Jan7 , Bob , Guy , Home
Jan8 , Bob , Guy , Offsite
Jan3 , Jane , Doe , Home
Jan4 , Jane , Doe , Home
Jan5 , Jane , Doe , Home
Jan9 , Bob , Guy , Home
Jan10, Bob , Guy , Home
Jan11, Bob , Guy , Home
Jan12, Jane , Doe , Home
Jan13, Jane , Doe , Home
Jan14, Jane , Doe , Home
and I want it in the format
DateStart, DateEnd, FirstName, LastName, Unit
Jan1 , Jan3 , Bob , Guy , Home
Jan5 , Jan7 , Bob , Guy , Home
Jan8 , Jan8 , Bob , Guy , Offsite
Jan3 , Jan5 , Jane , Doe , Home
Jan9 , Jan11 , Bob , Guy , Home
Jan12 , Jan14 , Jane , Doe , Home
Edit: Updated the data.
How can I convert the data easily?
This is a one time conversion.
Thanks for the comments/answers!
The SQL below will produce the desired output, But I'm not sure if you're not better off writing this is C#
Update
This has been updated to a proper gaps and island solution. This is based on the MSDN article Islands and Gaps in Sequential Numbers by Alexander Kozak. This could be improved on by using CTE's and also the Exists could be replaced with LEFT JOINs.
It should be noted that this relies on the Dates to not have any time component. If there were a time component it would have to be removed up front.
Output
Date enddate FirstName LastName Unit
----------------------- ----------------------- --------- -------- -------
2011-01-01 00:00:00.000 2011-01-03 00:00:00.000 Bob Guy Home
2011-01-03 00:00:00.000 2011-01-05 00:00:00.000 Jane Doe Home
2011-01-05 00:00:00.000 2011-01-07 00:00:00.000 Bob Guy Home
2011-01-08 00:00:00.000 2011-01-08 00:00:00.000 Bob Guy Offsite
2011-01-09 00:00:00.000 2011-01-11 00:00:00.000 Bob Guy Home
2011-01-12 00:00:00.000 2011-01-14 00:00:00.000 Jane Doe Home
SQL Statement
SET NOCOUNT On
DECLARE @Test
Table (
Date datetime,
FirstName varchar(100),
LastName varchar(100),
Unit varchar(7))
INSERT INTO @Test VALUES ('01/01/2011' , 'Bob', 'Guy', 'Home')
INSERT INTO @Test VALUES ('01/02/2011' , 'Bob', 'Guy', 'Home')
INSERT INTO @Test VALUES ('01/03/2011' , 'Bob', 'Guy', 'Home')
INSERT INTO @Test VALUES ('01/05/2011' , 'Bob', 'Guy', 'Home')
INSERT INTO @Test VALUES ('01/06/2011' , 'Bob', 'Guy', 'Home')
INSERT INTO @Test VALUES ('01/07/2011' , 'Bob', 'Guy', 'Home')
INSERT INTO @Test VALUES ('01/08/2011' , 'Bob', 'Guy', 'Offsite')
INSERT INTO @Test VALUES ('01/03/2011' , 'Jane', 'Doe', 'Home')
INSERT INTO @Test VALUES ('01/04/2011' , 'Jane', 'Doe', 'Home')
INSERT INTO @Test VALUES ('01/05/2011' , 'Jane', 'Doe', 'Home')
INSERT INTO @Test VALUES ('01/09/2011' , 'Bob', 'Guy', 'Home')
INSERT INTO @Test VALUES ('01/10/2011' , 'Bob', 'Guy', 'Home')
INSERT INTO @Test VALUES ('01/11/2011' , 'Bob', 'Guy', 'Home')
INSERT INTO @Test VALUES ('01/12/2011' , 'Jane', 'Doe', 'Home')
INSERT INTO @Test VALUES ('01/13/2011' , 'Jane', 'Doe', 'Home')
INSERT INTO @Test VALUES ('01/14/2011' , 'Jane', 'Doe', 'Home')
SELECT
t1.Date,
MIN(t2.Date) enddate,
t1.FirstName,
t1.LastName,
t1.Unit
FROM
(SELECT *
FROM
@Test t1
WHERE
NOT EXISTS(SELECT * FROM @Test t2
WHERE
t1.firstName = t2.FirstName
AND t1.LastName = t2.LastName
AND t1.Unit = t2.Unit
and t1.Date - t2.Date = 1))
t1
INNER JOIN (SELECT * FROM @Test t1
WHERE
NOT EXISTS(SELECT * FROM @Test t2
WHERE
t1.firstName = t2.FirstName
AND t1.LastName = t2.LastName
AND t1.Unit = t2.Unit
and t2.Date - t1.Date = 1)) t2
ON
t1.firstName = t2.FirstName
AND t1.LastName = t2.LastName
AND t1.Unit = t2.Unit
AND t1.Date <= t2.Date
GROUP BY
t1.Date,
t1.FirstName,
t1.LastName,
t1.Unit
Using Conrad's test data and Itzik Ben Gan's approach!
;WITH base AS (
SELECT FirstName,
LastName,
Unit,
Date,
DATEDIFF(DAY,0,Date) -
DENSE_RANK() OVER (PARTITION BY FirstName, LastName, Unit
ORDER BY DATEDIFF(DAY,0,Date)) AS G
FROM @Test
)
SELECT FirstName, LastName, Unit, MIN(Date) DateStart,MAX(Date) DateEnd
FROM base
GROUP BY G, FirstName, LastName, Unit