DENSE_RANK() without duplication

2019-02-17 16:38发布

问题:

Here's what my data looks like:

| col1 | col2 | denserank | whatiwant |
|------|------|-----------|-----------|
| 1    | 1    | 1         | 1         |
| 2    | 1    | 1         | 1         |
| 3    | 2    | 2         | 2         |
| 4    | 2    | 2         | 2         |
| 5    | 1    | 1         | 3         |
| 6    | 2    | 2         | 4         |
| 7    | 2    | 2         | 4         |
| 8    | 3    | 3         | 5         |

Here's the query I have so far:

SELECT col1, col2, DENSE_RANK() OVER (ORDER BY COL2) AS [denserank]
FROM [table1]
ORDER BY [col1] asc

What I'd like to achieve is for my denserank column to increment every time there is a change in the value of col2 (even if the value itself is reused). I can't actually order by the column I have denserank on, so that won't work). See the whatiwant column for an example.

Is there any way to achieve this with DENSE_RANK()? Or is there an alternative?

回答1:

Try this using window functions:

with t(col1  ,col2) as (
select 1 , 1 union all  
select 2 , 1 union all  
select 3 , 2 union all  
select 4 , 2 union all  
select 5 , 1 union all  
select 6 , 2 union all  
select 7 , 2 union all  
select 8 , 3
)
select t.col1,
    t.col2,
    sum(x) over (
        order by col1
        ) whatyouwant
from (
    select t.*,
        case 
            when col2 = lag(col2) over (
                    order by col1
                    )
                then 0
            else 1
            end x
    from t
    ) t
order by col1;

Produces:

It does a single table read and forms group of consecutive equal col2 values in increasing order of col1 and then finds dense rank on that.

  • x: Assign value 0 if previous row's col2 is same as this row's col2 (in order of increasing col1) otherwise 1
  • whatyouwant: create groups of equal values of col2 in order of increasing col1 by doing an incremental sum of the value x generated in the last step and that's your output.


回答2:

I would do it with a recursive cte like this:

declare @Dept table (col1 integer, col2 integer)

insert into @Dept values(1, 1),(2, 1),(3, 2),(4, 2),(5, 1),(6, 2),(7, 2),(8, 3)

;with a as (
select col1, col2, 
ROW_NUMBER() over (order by col1) as rn
from @Dept),
s as 
(select col1, col2, rn, 1 as dr from a where rn=1
union all
select a.col1, a.col2, a.rn, case when a.col2=s.col2 then s.dr else s.dr+1 end as dr 
from a inner join s on a.rn=s.rn+1)
col1, col2, dr from s

result:

col1        col2        dr
----------- ----------- -----------
1           1           1
2           1           1
3           2           2
4           2           2
5           1           3
6           2           4
7           2           4
8           3           5

The ROW_NUMBER is only required in case your col1 values are not sequential. If they are you can use the recursive cte straight away



回答3:

Here is one way using SUM OVER(Order by) window aggregate function

SELECT col1,Col2,
       Sum(CASE WHEN a.prev_val = a.col2 THEN 0 ELSE 1 END) OVER(ORDER BY col1) AS whatiwant 
FROM   (SELECT col1,
               col2,
               Lag(col2, 1)OVER(ORDER BY col1) AS prev_val
        FROM   Yourtable) a
ORDER  BY col1; 

How it works:

LAG window function is used to find the previous col2 for each row ordered by col1

SUM OVER(Order by) will increment the number only when previous col2 is not equal to current col2



回答4:

I think this is possible in pure SQL using some gaps and islands tricks, but the path of least resistance might be to use a session variable combined with LAG() to keep track of when your computed dense rank changes value. In the query below, I use @a to keep track of the change in the dense rank, and when it changes this variable is incremented by 1.

DECLARE @a int
SET @a = 1
SELECT t.col1,
       t.col2,
       t.denserank,
       @a = CASE WHEN LAG(t.denserank, 1, 1) OVER (ORDER BY t.col1) = t.denserank
                 THEN @a
                 ELSE @a+1 END AS [whatiwant]
FROM
(
    SELECT col1, col2, DENSE_RANK() OVER (ORDER BY COL2) AS [denserank]
    FROM [table1]
) t
ORDER BY t.col1