Rank rows in a column under conditions on a differ

2019-09-14 07:23发布

问题:

I have the following dataset:

 id |   date   | state
-----------------------
  1 | 01/01/17 | high
  1 | 02/01/17 | high
  1 | 03/01/17 | high
  1 | 04/01/17 | miss
  1 | 05/01/17 | high
  2 | 01/01/17 | miss
  2 | 02/01/17 | high
  2 | 03/01/17 | high
  2 | 04/01/17 | miss
  2 | 05/01/17 | miss
  2 | 06/01/17 | high

I want to create a column rank_state which ranks, within groups of id, the entries as per increasing date (starting from rank 0) which do not have the state of "miss". Furthermore, the rank repeats itself if the entry has a state of "miss". The output should look like:

 id |   date   | state | rank_state
------------------------------------
  1 | 01/01/17 | high  |    0
  1 | 02/01/17 | high  |    1
  1 | 03/01/17 | high  |    2
  1 | 04/01/17 | miss  |    2
  1 | 05/01/17 | high  |    3
  2 | 01/01/17 | miss  |    0
  2 | 02/01/17 | high  |    0
  2 | 03/01/17 | high  |    1
  2 | 04/01/17 | miss  |    1
  2 | 05/01/17 | miss  |    1
  2 | 06/01/17 | high  |    2

For example, the 4th row has a rank of 2 since it's state is "miss", i.e. it repeats the rank of row 3 (the same applies to rows 9 and 10). Please note that rows 6 and 7 should have rank 0.

I have tried the following: ,(case when state is not in ('miss') then (rank() over (partition by id order by date desc) - 1) end) as state_rank and ,rank() over (partition by id order by case when state is not in ('miss') then date end) as state_rank but neither give me the desired result. Any ideas would be very helpful.

回答1:

More than likely you want:

SELECT *, 
       GREATEST(
         COUNT(case when state != 'miss' then 1 else null end) 
           OVER(PARTITION BY id ORDER BY date) - 1,
         0
       ) as "state_rank"
FROM tbl;

SQL Fiddle

Basically:

  • make your window frame (partition) overid
  • only count the ones that aren't 'miss'
  • because it could be a negative number if starting the record, you can slap on the GREATEST to use 0 (preventing negatives)


回答2:

Just add frame_clause to vol7ron's answer since Redshift requires it :

select *
    , GREATEST(COUNT(case when state != 'miss' then 1 else null end) 
            OVER(PARTITION BY id order by date rows between unbounded preceding and current row) -1 , 0 ) as state_rank
from tbl;