How to group/rank records based on a changing valu

2019-04-14 00:32发布

问题:

I have the following table ordered by Id, Year DESC

Id   Year   Valid
1    2011   1
1    2010   1
1    2009   0
1    2002   1
4    2013   1
4    2012   1
4    2011   1
etc.

What I would like to have is an extra rank field like:

Id   Year   Valid   Rank
1    2011   1       1
1    2010   1       1
1    2009   0       2
1    2002   1       3
4    2013   1       1
4    2012   1       1
4    2011   1       1
etc.

Basically per Id alternating ranks for every change in the Valid field. In such a way I can query on the rank=1 field to have all Valid=1 records for each and every Id up to the first Valid=0. Or is there an easier way to select the first top number of records matching a certain condition (For Id=1 only the first two records). I already played with ROW_NUMBER(), RANK() and PARTITION BY but I can't seem to get it to work. Must avoid nesting of queries since the actual query is run against a large database.

Anyone any ideas?

Thanks and cheers, Nyquist

回答1:

yes, using Left JOIN we can do that. see the below code and result.

first image is actual data inserted and second image is expected result.

DECLARE @t TABLE
(
    id      INT
    ,_YEAR  INT
    ,valid  TINYINT
)
INSERT INTO @t( id, [_YEAR], valid )
            SELECT 1,2011,1
UNION ALL   SELECT 1,2010,1
UNION ALL   SELECT 1,2009,0
UNION ALL   SELECT 1,2002,1
UNION ALL   SELECT 4,2013,1
UNION ALL   SELECT 4,2012,1
UNION ALL   SELECT 4,2011,1
UNION ALL   SELECT 5,2013,0
UNION ALL   SELECT 5,2011,1
UNION ALL   SELECT 5,2010,1
UNION ALL   SELECT 6,2010,1
UNION ALL   SELECT 6,2011,0
UNION ALL   SELECT 6,2014,1


SELECT  q1.*
FROM @t q1
LEFT JOIN 
(
    SELECT id,MAX(_YEAR) ZeroYear
    FROM @t
    WHERE valid = 0
    GROUP BY id
)q2
    ON q1.id=q2.id
WHERE 
(q2.ID IS NULL)
OR
(q2.id IS NOT NULL AND q1.id IS NOT NULL AND q1.id=q2.id AND q1.[_YEAR] > q2.ZeroYear)

Edit-1: In above query for the column ZeroYear, previously i did MIN(_YEAR) but as you can see in the comment from "Andriy M" instead of MIN right function is MAX.



回答2:

This is somewhat similar to @Anup Shah's suggestion but doesn't use a join and instead uses a window aggregate function:

WITH derived AS (
  SELECT
    Id,
    Year,
    Valid,
    LatestInvalidYear = ISNULL(
      MAX(CASE Valid WHEN 0 THEN Year END) OVER (PARTITION BY Id),
      0
    )
  FROM atable
)
SELECT
  Id,
  Year,
  Valid
FROM derived
WHERE Year > LatestInvalidYear
;

Basically, the window MAX calculates the latest Valid = 0 year per Id. If no such year is found, MAX results in a NULL, which is replaced with a 0 by ISNULL. So, for your example, the derived set would be returned as this:

Id   Year   Valid   LatestInvalidYear
--   ----   -----   -----------------
1    2011   1       2009
1    2010   1       2009
1    2009   0       2009
1    2002   1       2009
4    2013   1       0
4    2012   1       0
4    2011   1       0

Obviously, you can now easily apply the filter Year > LatestInvalidYear to get the required rows, and that is exacly what the main SELECT does.



回答3:

If you're using SQL 2012, you can use lag

select id, year, valid,
    case when ch = 0 then 1 else lag(ch,1,0) over (order by id, year desc) + 2 end rank
from
    (
        select 
            * ,
            abs(valid - lag(valid,1,1) over (order by id, year desc)) as ch
        from YourTable
    ) t