create partition based on the difference between s

2019-09-04 04:52发布

问题:

I am using SQL Server 2012.

I want to create a row_number based on whether the index in subsequent rows are increasing by 1 or more. For example, say I have a table that looks like:

  event row_index
    1       24          
    2       25  
    3       26
    4       30
    5       31
    6       42 
    7       43
    8       44
    9       45

Then what I want to do is create a column at the end, called seq_ID:

event    row_index   seq_id
    1       24         1         
    2       25         1
    3       26         1
    4       30         2
    5       31         2
    6       42         3
    7       43         3
    8       44         3
    9       45         3

basically, the seq_id only chances if the difference between subsequent row indexes is > 1. I have tried to use:

rank() over (partition by 'not sure what to put here' )

回答1:

Try this:

    ;with cte as
    (select *, 
     coalesce(row_index - (lag(row_index) over (order by event)),1) diff
     from tbl
    ),

    cte2 as
    (select *, 
     (select max(diff) 
      from cte c 
      where c.row_index <= d.row_index
      ) minri
     from cte d
     )

    select event, row_index, minri, 
    dense_rank() over (order by minri) rn 
    from cte2
  • The first CTE gets the differences using the lag function (available from SQL Server 2012 onwards).
  • The next CTE calculates when the difference exceeds 1 and assigns all records after that point to a 'group', until the next difference <> 1 is found. This is the key step in grouping.
  • The last step is to use dense_rank over the indicator calculated in the previous step to get the row numbers as required.

This solution does have a limitation in that it will fail if the differences are not in increasing order i.e. if you have two more values in the sample data such as 52 and 53, it will classify them into group 3 instead of creating a new group.

Demo

Update: The below approach can overcome the above limitation:

    ;with cte as
    (select *, 
     coalesce(row_index - (lag(row_index) over (order by event)),1) diff
     from tbl)
    ,cte2 as
    (select *,
     diff - coalesce(diff - (lag(diff) over (order by event)),0) tmp
     from cte d)

     select event,row_index, 
     1 + sum(case when tmp >= diff then 0 else 1 end) over (order by event) risum
     from cte2

Again the first step remains the same. But in step 2, we check only for transition to a different value of the difference between successive values, instead of using a min/max function. The ranking then uses a conditional sum to assign a group for each value in the original data.

Demo

This can be further simplified to:

select event, row_index, 
sum(case when diff <= 1 then 0 else 1 end) over (order by event) as rb
from
(select *, 
 row_index - (lag(row_index) over (order by event)) diff
 from tbl
) s


回答2:

With a huge number of records it may performs badly, but give it a try:

;with T as (
 select event, rowindex, rowindex-row_number() over (order by event) as rn from YourTableName
)
select event, (select count(distinct rn)+1 from T where rn<TT.rn) as seq_id
 from T TT order by event