Deterministic sort order for window functions

2019-08-15 08:21发布

问题:

I've a status table and I want to fetch the latest details.

Slno |   ID | Status | date
  1  |   1  | Pass   | 15-06-2015 11:11:00 - this is inserted first
  2  |   1  | Fail   | 15-06-2015 11:11:00 - this is inserted second 
  3  |   2  | Fail   | 15-06-2015 12:11:11 - this is inserted first
  4  |   2  | Pass   | 15-06-2015 12:11:11 - this is inserted second

I use a window function with partition by ID order by date desc to fetch the first value.
Excepted Output :

2  |   1  | Fail   | 15-06-2015 11:11:00 - this is inserted second
4  |   2  | Pass   | 15-06-2015 12:11:11 - this is inserted second

Actual Output :

1  |   1  | Pass   | 15-06-2015 11:11:00 - this is inserted first
3  |   2  | Fail   | 15-06-2015 12:11:11 - this is inserted first

According to [http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_order_by_WF.html], adding a second ORDER BY column to the window function may solve the problem. But I don't have any other column to differentiate the rows!
Is there another approach to solve the issue?

EDIT: I've added slno here for clarity. I don't have slno as such in the table!

My SQL:

with range as (
  select id from status where date between 01-06-2015 and 30-06-2015
), latest as ( 
  select status, id, row_number() OVER (PARTITION BY id ORDER BY date DESC) row_num
)
select * from latest where row_num = 1

回答1:

If you don't have slno in your table, then you don't have any reliable information which row was inserted first. There is no natural order in a table, the physical order of rows can change any time (with any update, or with VACUUM, etc.)

You could use an unreliable trick: order by the internal ctid.

select *
from  (
   select id, status
        , row_number() OVER (PARTITION BY id
                             ORDER BY date, ctid) AS row_num
   from   status  -- that's your table name??
   where  date >= '2015-06-01'  -- assuming column is actually a date
   and    date <  '2015-07-01'
  ) sub
where  row_num = 1;
  • In absence of any other information which row came first (which is a design error to begin with, fix it!), you might try to save what you can using the internal tuple ID ctid

    • In-order sequence generation

    Rows will be in physical order when inserted initially, but that can change any time with any write operation to the table or VACUUM or other events.
    This is a measure of last resort and it will break.

  • Your presented query was invalid on several counts: missing column name in 1st CTE, missing table name in 2nd CTE, ...

  • You don't need a CTE for this.

Simpler with DISTINCT ON (considerations for ctid apply the same):

SELECT DISTINCT ON (id)
       id, status
FROM   status
WHERE  date >= '2015-06-01'
AND    date <  '2015-07-01'
ORDER  BY id, date, ctid;
  • Select first row in each GROUP BY group?