Postgres: Defining the longest streak (in days) pe

2020-07-18 11:45发布

There is a software project with some data as exemplified in the table below. In this table we have the developer, the dates he did some changes (management requests) in the system and the difference between each one of these dates in days.

 SeqID  | developer |   MR date     |   dates diff
--------+-----------+---------------+----------------
1       |   Ivy     |   01/02/2012  |   0
2       |   Ivy     |   02/02/2012  |   1
3       |   Ivy     |   03/02/2012  |   1
4       |   Ivy     |   10/02/2012  |   7
5       |   Ivy     |   13/02/2012  |   3   
6       |   Ivy     |   14/02/2012  |   1
1       |   Ken     |   17/02/2012  |   0
2       |   Ken     |   20/02/2012  |   3
3       |   Ken     |   22/02/2012  |   2
4       |   Ken     |   23/02/2012  |   1
5       |   Ken     |   24/02/2012  |   1
6       |   Ken     |   25/02/2012  |   1
7       |   Ken     |   01/03/2012  |   4
8       |   Ken     |   05/03/2012  |   4
1       |   Bob     |   19/02/2012  |   0
2       |   Bob     |   23/02/2012  |   4
3       |   Bob     |   01/03/2012  |   6
4       |   Bob     |   02/03/2012  |   1
5       |   Bob     |   03/03/2012  |   1
6       |   Bob     |   05/03/2012  |   2

What I want to know is what is the longest streak (consecutive changes done in max 1 day of difference). Something very similar from what we have in github statistics per developer. So the resultant table would be like:

developer   |   longest streak  
------------+------------------------   
    Ivy     |   2   
    Ken     |   3   
    Bob     |   2   

The solution I tried was to count how many lines we have with diff date = 1 grouping by developer. But this does not return the wanted result.
As per the resultant table the count should be done like: Dev = Ivy ; longest streak = 01/02/2012 to 03/02/2012 = 2 days , etc.
Can some of you assist me on this matter? Thanks,

1条回答
迷人小祖宗
2楼-- · 2020-07-18 12:05

There is a trick to doing this. If you subtract an increasing sequence of numbers from the dates, then they will be constant for dates that are sequential. We can then use this for defining groups for each developer.

select developer, max(numdays) as maxseq
from (select developer, grp, min(MRDate) as MR_start, max(MRDate) as MR_end,
             count(distinct MRDate) as numdays
      from (select t.*,
                  (MRDate - dense_rank() over (partition by developer order by date)) as grp
            from t
           ) t
      group by developer, grp
     ) t
group by developer;

If you know there is at most one record per date, then you can use row_number() instead of dense_rank() and count(*) instead of count(distinct MRDate).

查看更多
登录 后发表回答