Thanks to Mike for the suggestion to add the create/insert statements.
create table test (
pid integer not null,
date date not null,
primary key (pid, date)
);
insert into test values
(1,'2014-10-1')
, (1,'2014-10-2')
, (1,'2014-10-3')
, (1,'2014-10-5')
, (1,'2014-10-7')
, (2,'2014-10-1')
, (2,'2014-10-2')
, (2,'2014-10-3')
, (2,'2014-10-5')
, (2,'2014-10-7');
I want to add a new column that is 'days in current streak' so the result would look like:
pid | date | in_streak
-------|-----------|----------
1 | 2014-10-1 | 1
1 | 2014-10-2 | 2
1 | 2014-10-3 | 3
1 | 2014-10-5 | 1
1 | 2014-10-7 | 1
2 | 2014-10-2 | 1
2 | 2014-10-3 | 2
2 | 2014-10-4 | 3
2 | 2014-10-6 | 1
I've been trying to use the answers from
but I can't work out how to use the dense_rank()
trick with other window functions to get the right result.
Building on this table (not using the SQL keyword "date" as column name.):
Query:
Subtracting a
date
from anotherdate
yields aninteger
. Since you are looking for consecutive days, every next row would be greater by one. If we subtractrow_number()
from that, the whole streak ends up in the same group (grp
) perpid
. Then it's simple to deal out number per group.grp
is calculated with two subtractions, which should be fastest. An equally fast alternative could be:One multiplication, one subtraction. String concatenation and casting is more expensive. Test with
EXPLAIN ANALYZE
.Don't forget to partition by
pid
additionally in both steps, or you'll inadvertently mix groups that should be separated.Using a subquery, since that is typically faster than a CTE. There is nothing here that a plain subquery couldn't do.
And since you mentioned it:
dense_rank()
is obviously not necessary here. Basicrow_number()
does the job.You'll get more attention if you include CREATE TABLE statements and INSERT statements in your question.
The principle is simple. A streak of distinct, consecutive dates minus row_number() is a constant. You can group by the constant, and take the dense_rank() over that result.