My table has values like (RowCount
is generated by the query below):
ID Date_trans Time_trans Price RowCount
------- ----------- ---------- ----- --------
1699093 22-Feb-2011 09:30:00 58.07 1
1699094 22-Feb-2011 09:30:00 58.08 1
1699095 22-Feb-2011 09:30:00 58.08 2
1699096 22-Feb-2011 09:30:00 58.08 3
1699097 22-Feb-2011 09:30:00 58.13 1
1699098 22-Feb-2011 09:30:00 58.13 2
1699099 22-Feb-2011 09:30:00 58.12 1
1699100 22-Feb-2011 09:30:08 58.13 3
1699101 22-Feb-2011 09:30:09 57.96 1
1699102 22-Feb-2011 09:30:09 57.95 1
1699103 22-Feb-2011 09:30:09 57.93 1
1699104 22-Feb-2011 09:30:09 57.96 2
1699105 22-Feb-2011 09:30:09 57.93 2
1699106 22-Feb-2011 09:30:09 57.93 3
1699107 22-Feb-2011 09:30:37 58 1
1699108 22-Feb-2011 09:30:37 58.08 4
1699109 22-Feb-2011 09:30:38 58.08 5
1699110 22-Feb-2011 09:30:41 58.02 1
1699111 22-Feb-2011 09:30:41 58.02 2
1699112 22-Feb-2011 09:30:41 58.01 1
1699113 22-Feb-2011 09:30:41 58.01 2
1699114 22-Feb-2011 09:30:41 58.01 3
1699115 22-Feb-2011 09:30:42 58.02 3
1699116 22-Feb-2011 09:30:42 58.02 4
1699117 22-Feb-2011 09:30:45 58.04 1
1699118 22-Feb-2011 09:30:54 58 2
1699119 22-Feb-2011 09:30:57 58.05 1
The ID
column is an IDENTITY column.
And I'm using this query to get the consecutive row count as:
SELECT ID, Date_trans, Time_trans, Price
,ROW_NUMBER() OVER(PARTITION BY Price ORDER BY ID) RowCount
FROM MyTable
ORDER BY ID;
The RowCount
I get is right for most of the values but wrong for some values. For instance:
- ID 1699100 Price 58.13 – count should be 1 (showing 3).
- ID 1699104 Price 57.96 – count should be 1 (showing 2).
- ID 1699105, 1699106 Price 57.93 – count should be 1, 2 (showing 2, 3).
I have tried the same query in PostgreSQL and found the same results.
I have uploaded a csv data sample here.
I'm stuck with such unexpected results of partition. Can anybody help me?
Pure SQL
The logic:
step
. (Special case of first row works, too.)grp
.Honestly, I think @Andriy's solution is a wee bit more elegant. It needs three window functions, too, but can do it in only two query steps. In a quick test on the small sample it was also slightly faster. So, +1 from me.
If performance is of the essence, a more specialized solution with a
PL/pgSQL function
should be considerably faster, because it only needs to scan and order the table once.
Call:
In another quick test on the small sample this was 3-4x faster. Test with
EXPLAIN ANALYZE
to see.As an aside: you could simplify your table (and queries) and save some bytes of storage by merging
date_trans date
andtime_trans time
intots_trans timestamp
.It's very simple and very fast to extract
date
ortime
from atimestamp
with a cast:The manual about date/time types.
1699100 Price 58.0 - is showing 3 because 1699097,8 are 1,2
1699104 Price 57.96 – is showing 2 because 1669101 is 1.
1699105, 1699106 Price 57.93 – showing 2, 3, because 1699103 is 1
If you want to find items of the same value in a sequence, one option is to join the data to the previous ID and see if the values are the same
The
PARTITION BY
clause of theROW_NUMBER()
function instructs it to partition the entire row set byPrice
values and assign row numbers in the ascending order ofID
s.It seems like you want to distinguish between any two groups of rows with identical
Price
values that are separated by at least one row with a differentPrice
.There may be various ways to achieve that. In SQL Server (and I think the same would work in PostgreSQL too), I would first use two
ROW_NUMBER()
calls to get an additional partitioning criterion, then rank rows once again using that criterion, like this:Here's a SQL Fiddle demo.
From what I can gather by your expections of results, you need to partition over Time_trans too:
I believe this is the case as your expecting the ROW_NUMBER to start again when the Time-trans value changes as you progress through the data.
Also you might want to add Date_trans in there too if there could be multiple dates in the table, which I would expect.