Disclaimer: The shown problem is much more general than I expected first. The example below is taken from a solution to another question. But now I was taking this sample for solving many problems more - mostly related to time series (have a look at the "Linked" section in the right bar).
So I am trying to explain the problem more generally first:
I am using PostgreSQL but I am sure this problem exists in other window function supporting DBMS' (MS SQL Server, Oracle, ...) as well.
Window functions can be used to group certain values together by a common attribute or value. For example you can group rows by a date. Then you are able to calculate the max value within every single date or an average value or counting rows or whatever.
This can be achieved by defining a PARTITION
. Grouping by dates would work with PARTITION BY date_column
. Now you want to do an operation which needs a special order within your groups (calculating row numbers or sum up a column). This can be done with PARTITON BY date_column ORDER BY an_attribute_column
.
Now think about a finer resolution of time series. What if you do not have dates but timestamps. Then you cannot group by the time column anymore. But nevertheless it might be important to analyse the data in the order they were added (maybe the timestamp is the creating time of your data set). Then you realize that some consecutive rows have the same value and you want to group your data by this common value. But the clue is that the rows have different timestamps.
The problem here is that you cannot do a PARTITION BY value_column
. Because PARTITION BY
forces an ordering first. So your table would be ordered by the value_column
before the grouping and is not ordered by the timestamp anymore. This yields in results you are not expecting.
More general speaking: The problem is to ensure a special ordering even if the ordered column is not part of the created partition.
Example:
db<>fiddle
I have the following table:
ts val
100000 50
130100 30050
160100 60050
190200 100
220200 30100
250200 30100
300000 300
500000 100
550000 1000
600000 1000
650000 2000
700000 2000
720000 2000
750000 300
I had the problem that I had to group all tied values of the column val
. But I wanted to hold the order by ts
. To achieve this I wanted to add a column with a unique ID per val
group
Expected result:
ts val group
100000 50 1
130100 30050 2
160100 60050 3
190200 100 4
220200 30100 5 \ same group
250200 30100 5 /
300000 300 6
500000 100 7
550000 1000 8 \ same group
600000 1000 8 /
650000 2000 9 \
700000 2000 9 | same group
720000 2000 9 /
750000 300 10
First try was the use of the rank
window function which would do this job normally:
SELECT
*,
rank() OVER (PARTITION BY val ORDER BY ts)
FROM
test
But in this case this doesn't work because the PARTITION BY
clause orders the table first by its partition columns (val
in this case) and then by its ORDER BY
columns. So the order is by val, ts
instead of the expected order by ts
. So the result was not the expected one of course.
ts val rank
100000 50 1
190200 100 1
500000 100 2
300000 300 1
750000 300 2
550000 1000 1
600000 1000 2
650000 2000 1
700000 2000 2
720000 2000 3
130100 30050 1
220200 30100 1
250200 30100 2
160100 60050 1
The question is: How to get the group ids with respect to the order by ts
?
Edit: I added an own solution below but I feel very uncomfortable with it. It seems way too complicated. I was wondering if there's a better way to achieve this result.