Drive EndDate of Current Row From StarDate of Next

2019-07-12 05:03发布

问题:

Can some one please help me with how to create end date from start date.

Products referred to a company for testing while the product with the company they carry out multiple tests on different dates and record the test date to establish the product condition i.e. (outcomeID). I need to establish the StartDate which is the testDate and EndDate which is the start date of the next row. But if multiple consecutive tests resulted in the same OutcomeID I need to return only one row with the StartDate of the first test and the end date of the last test. In another word if the outcomeID did not change over a few consecutive tests. Here is my data set


DECLARE @ProductTests TABLE

( RequestID int not null, ProductID int not null, TestID int not null, TestDate datetime null, OutcomeID int ) insert into @ProductTests (RequestID ,ProductID ,TestID ,TestDate ,OutcomeID ) select 1,2,22,'2005-01-21',10 union all select 1,2,42,'2007-03-17',10 union all select 1,2,45,'2010-12-25',10 union all select 1,2,325,'2011-01-14',13 union all select 1,2,895,'2011-08-10',15 union all select 1,2,111,'2011-12-23',15 union all select 1,2,636,'2012-05-02',10 union all select 1,2,554,'2012-11-08',17

--select *from @producttests


RequestID   ProductID   TestID    TestDate        OutcomeID
1               2           22    2005-01-21         10
1               2           42    2007-03-17         10
1               2           45    2010-12-25         10
1               2           325   2011-01-14         13
1               2           895   2011-08-10         15
1               2           111   2011-12-23         15
1               2           636   2012-05-02         10
1               2           554   2012-11-08         17
And this is what I need to achieve.


RequestID ProductID  StartDate        EndDate           OutcomeID
1            2       2005-01-21       2011-01-14        10
1            2       2011-01-14       2011-08-10        13
1            2       2011-08-10       2012-05-02        15
1            2       2012-05-02       2012-11-08        10
1            2       2012-11-08       NULL              17

As you see from the dataset the first three tests (22, 42, and 45) all resulted in OutcomeID 10 so in my result I only need start date of test 22 and end date of test 45 which is the start date of test 325.As you see in test 636 outcomeID has gone back to 10 from 15 so it needs to be returned too.

--This is what I have managed to achieve at the moment using the following script


select T1.RequestID,T1.ProductID,T1.TestDate AS StartDate
       ,MIN(T2.TestDate) AS EndDate ,T1.OutcomeID 
from   @producttests T1
left join @ProductTests T2 ON T1.RequestID=T2.RequestID 
and T1.ProductID=T2.ProductID and T2.TestDate>T1.TestDate

group by T1.RequestID,T1.ProductID ,T1.OutcomeID,T1.TestDate

order by T1.TestDate

Result:


RequestID   ProductID   StartDate   EndDate       OutcomeID
1                  2    2005-01-21  2007-03-17         10
1                  2    2007-03-17  2010-12-25         10
1                  2    2010-12-25  2011-01-14         10
1                  2    2011-01-14  2011-08-10         13
1                  2    2011-08-10  2011-12-23         15
1                  2    2011-12-23  2012-05-02         15
1                  2    2012-05-02  2012-11-08         10
1                  2    2012-11-08  NULL               17

回答1:

nov 7 but still not answered so here is my solution not soo pretty but works

my hint is read about windowing , ranking and aggregate functions like row_number, rank , avg, sum etc. those are essential when you want to write raports , and becoming quite powerfull in sql server 2012

i have also used CTE (common table expression) but it can be written as subquery or temporary table

;with cte ( ida, requestid, productid, testid, testdate, outcomeid) as
(
-- select rows where the outcome id is changing 
select b.* from 
(select  ROW_NUMBER() over( partition by requestid, productid order by testDate) as id, * from #ProductTests)a 
right outer join 
(select  ROW_NUMBER() over(partition by requestid, productid order by testDate) as id, * from #ProductTests) b
on a.requestID = b.requestID and a.productID = b.productID and a.id +1  = b.id 
where 1=1 
--or a.id = 1
and a.outcomeid <> b.outcomeid or b.outcomeid is null or a.id is null
)
select --*
a.RequestID,a.ProductID,a.TestDate AS StartDate   ,MIN(b.TestDate) AS EndDate ,a.OutcomeID  
from  cte a left join cte b on a.requestid = b.requestid and a.productid = b.productid and a.testdate < b.testdate
group by a.RequestID,a.ProductID ,a.OutcomeID,a.TestDate
order by StartDate


回答2:

Actually, there seem to be two problems in your question. One is how to group sequential (based on specific criteria) rows containing the same value. The other is the one actually spelled out in your title, i.e. how to use the next row's StartDate as the current row's EndDate.

Personally, I would solve these two problems in the order I mentioned them, so I would first address the grouping problem. One way to group the data properly in this case would be to use double ranking like this:

WITH partitioned AS (
  SELECT
    *,
    grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID            ORDER BY TestDate)
        - ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
  FROM @ProductTests
)
, grouped AS (
  SELECT
    RequestID,
    ProductID,
    StartDate = MIN(TestDate),
    OutcomeID
  FROM partitioned
  GROUP BY
    RequestID,
    ProductID,
    OutcomeID,
    grp
)
SELECT *
FROM grouped
;

This should give you the following output for your data sample:

RequestID  ProductID  StartDate   OutcomeID
---------  ---------  ----------  ---------
1          2          2005-01-21  10
1          2          2011-01-14  13
1          2          2011-08-10  15
1          2          2012-05-02  10
1          2          2012-11-08  17

Obviously, one thing is still missing, and it's EndDate, and now is the right time to care about it. Use ROW_NUMBER() once again, to rank the result set of the grouped CTE, then use the rankings in the join condition when joining the result set with itself (using an outer join):

WITH partitioned AS (
  SELECT
    *,
    grp = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID            ORDER BY TestDate)
        - ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID, OutcomeID ORDER BY TestDate)
  FROM @ProductTests
)
, grouped AS (
  SELECT
    RequestID,
    ProductID,
    StartDate = MIN(TestDate),
    OutcomeID,
    rnk = ROW_NUMBER() OVER (PARTITION BY RequestID, ProductID ORDER BY MIN(TestDate))
  FROM partitioned
  GROUP BY
    RequestID,
    ProductID,
    OutcomeID,
    grp
)
SELECT
  g1.RequestID,
  g1.ProductID,
  g1.StartDate,
  g2.StartDate AS EndDate,
  g1.OutcomeID
FROM grouped g1
LEFT JOIN grouped g2
  ON g1.RequestID = g2.RequestID
 AND g1.ProductID = g2.ProductID
 AND g1.rnk = g2.rnk - 1
;

You can try this query at SQL Fiddle to verify that it returns the output you are after.



标签: tsql