Identify sub-set of records based on date and rule

2019-05-11 11:40发布

问题:

I have a dataset that looks like this:

I need to identify the rows that have Linked set to 1 but ONLY where they are together when sorted by ToDate descending as in the picture.

In other words I want to be able to identify these records (EDITED):

This is a simplified dataset, in fact there will be many more records...

The logic that defines whether a record is linked is if the FromDate of a record is within 8 weeks of the ToDate of the preceeding date...but this is testData so may not be perfect

What's the best way to do that please?

回答1:

You can use LAG() and LEAD() analytic functions:

SELECT * FROM (
    SELECT t.*,
           LAG(t.linked,1,0) OVER(ORDER BY t.FromDate DESC) as rnk_1, --Next one
           LEAD(t.linked,1,0) OVER(ORDER BY t.FromDate DESC) as rnk_2, -- Last one,
           LEAD(t.linked,2,0) OVER(ORDER BY t.FromDate DESC) as rnk_3 -- Last two,
    FROM YourTable t) s
WHERE ((s.rnk_1 = 1 OR s.rnk_2 = 1) AND s.linked = 1) OR 
      (s.rnk_2 = 1 and s.rnk_3 = 1 and s.linked = 0)
ORDER BY s.FromDate DESC

This will result in records that have linked = 1 and the previous/next record is also 1.



回答2:

Using LAG and LEAD functions you can examine the previous/next row values given a sort criteria.

You can achieve your required dataset using the following DDL:

;
WITH    CTE_LagLead
          AS (
              SELECT    FromDate,
                        ToDate,
                        NoOfDays,
                        Weeks,
                        Linked,
                        LAG(Linked, 1, 0) OVER (ORDER BY ToDate DESC) LinkedLag,
                        LEAD(Linked, 1, 0) OVER (ORDER BY ToDate DESC) LinkedLead
              FROM      @table
             )
    SELECT  FromDate,
            ToDate,
            NoOfDays,
            Weeks,
            Linked
    FROM    CTE_LagLead
    WHERE   Linked = 1 AND
            (LinkedLag = 1 OR
             LinkedLead = 1)
    ORDER BY ToDate DESC;

See working example



回答3:

here is the answer I came up with:

Select 
    * 
from 
    #tmpAbsences 
where 
    idcol between 1 AND (
    Select TOP 1 idcol from #tmpAbsences where Linked=0)

this includes the row 7 in the below picture: