Finding the largest group of consecutive numbers w

2019-04-09 22:43发布

问题:

I have the following data ordered by player_id and match_date. I would like to find out the group of records that has the maximum number of consecutive runs (4 runs from 2014-04-03 till 2014-04-12 for 3 consecutive times)

 player_id  match_date  runs
    1       2014-04-01    5
    1       2014-04-02    55       
    1       2014-04-03    4       
    1       2014-04-10    4       
    1       2014-04-12    4       
    1       2014-04-14    3       
    1       2014-04-19    4       
    1       2014-04-20    44               
    2       2014-04-01    23
    2       2014-04-02    23       
    2       2014-04-03    23       
    2       2014-04-10    23       
    2       2014-04-12    4       
    2       2014-04-14    3       
    2       2014-04-19    23       
    2       2014-04-20    1   

I have come up with the following SQL:

select *,row_number() over (partition by ranked.player_id,ranked.runs
order by ranked.match_date) as R from (
select player_id ,match_date,runs from players order by 1,2 desc )
ranked order by ranked.player_id, match_date asc

But this continues the rank from the previous consecutive runs (4 runs on 2014-04-19 for Player 1 is expected to get Rank 1 but gets Rank 4 since there were 3 occurrences of the same partition already). Similarly 23 runs for Player 2 on 2014-04-19 is expected to get Rank 1 but gets Rank 5 since there were 4 occurrences of 23 runs already for this player.

How do I reset the rank back to 1 when the value of runs changes from its previous row?

Schema, data, SQL and the output is available on SQLFiddle.

回答1:

You can do this with window functions.

select player_id, runs, count(*) as numruns
from (select p.*,
             (row_number() over (partition by player_id order by match_date) -
              row_number() over (partition by player_id, runs order by match_date)
             ) as grp
      from players p
     ) pg
group by grp, player_id, runs
order by numruns desc
limit 1;

The key observation is that "runs in a sequence" have this property: if you enumerate the rows (for each player) by date and you enumerate the rows for each player and by the runs by date, then the difference is constant when the runs are all the same and in order. That forms a group that you can use for aggregation to identify the player you want.

Here is the SQL Fiddle.



回答2:

select p1.player_id, p1.match_date, p1.runs, count(p2.match_date) from players p1
join players p2 on p1.player_id = p2.player_id
    and p1.match_date >= p2.match_date
    and p1.runs = p2.runs
    and not exists (
        select 1 from players p3
        where p3.runs <> p2.runs
        and p3.player_id = p2.player_id
        and p3.match_date < p1.match_date
        and p3.match_date > p2.match_date
    )
group by p1.player_id, p1.match_date, p1.runs
order by p1.player_id, p1.match_date

http://sqlfiddle.com/#!15/78a77/1