There is a software project with some data as exemplified in the table below. In this table we have the developer, the dates he did some changes (management requests) in the system and the difference between each one of these dates in days.
SeqID | developer | MR date | dates diff
--------+-----------+---------------+----------------
1 | Ivy | 01/02/2012 | 0
2 | Ivy | 02/02/2012 | 1
3 | Ivy | 03/02/2012 | 1
4 | Ivy | 10/02/2012 | 7
5 | Ivy | 13/02/2012 | 3
6 | Ivy | 14/02/2012 | 1
1 | Ken | 17/02/2012 | 0
2 | Ken | 20/02/2012 | 3
3 | Ken | 22/02/2012 | 2
4 | Ken | 23/02/2012 | 1
5 | Ken | 24/02/2012 | 1
6 | Ken | 25/02/2012 | 1
7 | Ken | 01/03/2012 | 4
8 | Ken | 05/03/2012 | 4
1 | Bob | 19/02/2012 | 0
2 | Bob | 23/02/2012 | 4
3 | Bob | 01/03/2012 | 6
4 | Bob | 02/03/2012 | 1
5 | Bob | 03/03/2012 | 1
6 | Bob | 05/03/2012 | 2
What I want to know is what is the longest streak (consecutive changes done in max 1 day of difference). Something very similar from what we have in github statistics per developer. So the resultant table would be like:
developer | longest streak
------------+------------------------
Ivy | 2
Ken | 3
Bob | 2
The solution I tried was to count how many lines we have with diff date = 1 grouping by developer. But this does not return the wanted result.
As per the resultant table the count should be done like: Dev = Ivy ; longest streak = 01/02/2012 to 03/02/2012 = 2 days , etc.
Can some of you assist me on this matter? Thanks,
There is a trick to doing this. If you subtract an increasing sequence of numbers from the dates, then they will be constant for dates that are sequential. We can then use this for defining groups for each developer.
If you know there is at most one record per date, then you can use
row_number()
instead ofdense_rank()
andcount(*)
instead ofcount(distinct MRDate)
.