Rank rows in a column under conditions on a differ

2019-09-14 07:03发布

I have the following dataset:

 id |   date   | state
-----------------------
  1 | 01/01/17 | high
  1 | 02/01/17 | high
  1 | 03/01/17 | high
  1 | 04/01/17 | miss
  1 | 05/01/17 | high
  2 | 01/01/17 | miss
  2 | 02/01/17 | high
  2 | 03/01/17 | high
  2 | 04/01/17 | miss
  2 | 05/01/17 | miss
  2 | 06/01/17 | high

I want to create a column rank_state which ranks, within groups of id, the entries as per increasing date (starting from rank 0) which do not have the state of "miss". Furthermore, the rank repeats itself if the entry has a state of "miss". The output should look like:

 id |   date   | state | rank_state
------------------------------------
  1 | 01/01/17 | high  |    0
  1 | 02/01/17 | high  |    1
  1 | 03/01/17 | high  |    2
  1 | 04/01/17 | miss  |    2
  1 | 05/01/17 | high  |    3
  2 | 01/01/17 | miss  |    0
  2 | 02/01/17 | high  |    0
  2 | 03/01/17 | high  |    1
  2 | 04/01/17 | miss  |    1
  2 | 05/01/17 | miss  |    1
  2 | 06/01/17 | high  |    2

For example, the 4th row has a rank of 2 since it's state is "miss", i.e. it repeats the rank of row 3 (the same applies to rows 9 and 10). Please note that rows 6 and 7 should have rank 0.

I have tried the following: ,(case when state is not in ('miss') then (rank() over (partition by id order by date desc) - 1) end) as state_rank and ,rank() over (partition by id order by case when state is not in ('miss') then date end) as state_rank but neither give me the desired result. Any ideas would be very helpful.

2条回答
Fickle 薄情
2楼-- · 2019-09-14 07:29

Just add frame_clause to vol7ron's answer since Redshift requires it :

select *
    , GREATEST(COUNT(case when state != 'miss' then 1 else null end) 
            OVER(PARTITION BY id order by date rows between unbounded preceding and current row) -1 , 0 ) as state_rank
from tbl;
查看更多
神经病院院长
3楼-- · 2019-09-14 07:39

More than likely you want:

SELECT *, 
       GREATEST(
         COUNT(case when state != 'miss' then 1 else null end) 
           OVER(PARTITION BY id ORDER BY date) - 1,
         0
       ) as "state_rank"
FROM tbl;

SQL Fiddle

Basically:

  • make your window frame (partition) overid
  • only count the ones that aren't 'miss'
  • because it could be a negative number if starting the record, you can slap on the GREATEST to use 0 (preventing negatives)
查看更多
登录 后发表回答