PostgreSQL: detecting the first/last rows of resul

2019-09-21 21:36发布

Is there any way to embed a flag in a select that indicates that it is the first or the last row of a result set? I'm thinking something to the effect of:

> SELECT is_first_row() AS f, is_last_row() AS l FROM blah;
  f  |  l
-----------
  t  |  f
  f  |  f
  f  |  f
  f  |  f
  f  |  t

The answer might be in window functions but I've only just learned about them, and I question their efficiency.

SELECT first_value(unique_column) OVER () = unique_column, last_value(unique_column) OVER () = unique_column, * FROM blah;

seems to do what I want. Unfortunately, I don't even fully understand that syntax, but since unique_column is unique and NOT NULL it should deliver unambiguous results. But if it does sorting, then the cure might be worse than the disease. (Actually, in my tests, unique_column is not sorted, so that's something.)

EXPLAIN ANALYZE doesn't indicate there's an efficiency problem, but when has it ever told me what I needed to know?

And I might need to use this in an aggregate function, but I've just been told window functions aren't allowed there.

4条回答
做自己的国王
2楼-- · 2019-09-21 21:43

It is simple using window functions with particular frames:

with t(x, y) as (select generate_series(1,5), random()) 
select *,
  count(*) over (rows between unbounded preceding and current row),
  count(*) over (rows between current row and unbounded following)
from t;
┌───┬───────────────────┬───────┬───────┐
│ x │         y         │ count │ count │
├───┼───────────────────┼───────┼───────┤
│ 1 │ 0.543995119165629 │     1 │     5 │
│ 2 │ 0.886343683116138 │     2 │     4 │
│ 3 │ 0.124682310037315 │     3 │     3 │
│ 4 │ 0.668972567655146 │     4 │     2 │
│ 5 │ 0.266671542543918 │     5 │     1 │
└───┴───────────────────┴───────┴───────┘

As you can see count(*) over (rows between unbounded preceding and current row) returns rows count from the data set beginning to current row and count(*) over (rows between current row and unbounded following) returns rows count from the current to data set end. 1 indicates the first/last rows.

It works until you ordering your data set by order by. In this case you need to duplicate it in the frames definitions:

with t(x, y) as (select generate_series(1,5), random()) 
select *,
  count(*) over (order by y rows between unbounded preceding and current row),
  count(*) over (order by y rows between current row and unbounded following)
from t order by y;
┌───┬───────────────────┬───────┬───────┐
│ x │         y         │ count │ count │
├───┼───────────────────┼───────┼───────┤
│ 1 │ 0.125781774986535 │     1 │     5 │
│ 4 │  0.25046408502385 │     2 │     4 │
│ 5 │ 0.538880597334355 │     3 │     3 │
│ 3 │ 0.802807193249464 │     4 │     2 │
│ 2 │ 0.869908029679209 │     5 │     1 │
└───┴───────────────────┴───────┴───────┘

PS: As mentioned by a_horse_with_no_name in the comment:

there is no such thing as the "first" or "last" row without sorting.

查看更多
时光不老,我们不散
3楼-- · 2019-09-21 21:46

In fact, Window Functions are a great approach and for that requirement of yours, they are awesome.

Regarding efficiency, window functions work over the data set already at hand. Which means the DBMS will just add extra processing to infer first/last values.

Just one thing I'd like to suggest: I like to put an ORDER BY criteria inside the OVER clause, just to ensure the data set order is the same between multiple executions, thus returning the same values to you.

查看更多
萌系小妹纸
4楼-- · 2019-09-21 22:03

You can use the lead() and lag() window functions (over the appropiate window) and compare them to NULL:


-- \i tmp.sql

CREATE TABLE ztable
( id SERIAL PRIMARY KEY
  , starttime TIMESTAMP
);

INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '1 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '2 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '3 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '4 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '5 minute');
INSERT INTO ztable (starttime) VALUES ( now() - INTERVAL '6 minute');

SELECT id, starttime
        , ( lead(id) OVER www IS NULL) AS is_first
        , ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY id )
ORDER BY id
        ;


SELECT id, starttime
        , ( lead(id) OVER www IS NULL) AS is_first
        , ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY starttime )
ORDER BY id
        ;

SELECT id, starttime
        , ( lead(id) OVER www IS NULL) AS is_first
        , ( lag(id) OVER www IS NULL) AS is_last
FROM ztable
WINDOW www AS (ORDER BY starttime )
ORDER BY random()
        ;

Result:


INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
INSERT 0 1
 id |         starttime          | is_first | is_last 
----+----------------------------+----------+---------
  1 | 2018-08-31 18:38:45.567393 | f        | t
  2 | 2018-08-31 18:37:45.575586 | f        | f
  3 | 2018-08-31 18:36:45.587436 | f        | f
  4 | 2018-08-31 18:35:45.592316 | f        | f
  5 | 2018-08-31 18:34:45.600619 | f        | f
  6 | 2018-08-31 18:33:45.60907  | t        | f
(6 rows)

 id |         starttime          | is_first | is_last 
----+----------------------------+----------+---------
  1 | 2018-08-31 18:38:45.567393 | t        | f
  2 | 2018-08-31 18:37:45.575586 | f        | f
  3 | 2018-08-31 18:36:45.587436 | f        | f
  4 | 2018-08-31 18:35:45.592316 | f        | f
  5 | 2018-08-31 18:34:45.600619 | f        | f
  6 | 2018-08-31 18:33:45.60907  | f        | t
(6 rows)

 id |         starttime          | is_first | is_last 
----+----------------------------+----------+---------
  2 | 2018-08-31 18:37:45.575586 | f        | f
  4 | 2018-08-31 18:35:45.592316 | f        | f
  6 | 2018-08-31 18:33:45.60907  | f        | t
  5 | 2018-08-31 18:34:45.600619 | f        | f
  1 | 2018-08-31 18:38:45.567393 | t        | f
  3 | 2018-08-31 18:36:45.587436 | f        | f
(6 rows)

[updated: added a randomly sorted case]

查看更多
姐就是有狂的资本
5楼-- · 2019-09-21 22:08

Try using

SELECT columns 
FROM mytable 
Join conditions
WHERE conditions ORDER BY date DESC LIMIT 1

UNION ALL 

SELECT columns
FROM mytable 
Join conditions
WHERE conditions ORDER BY date ASC LIMIT 1

SELECT just cut half of the processing time. You can go for indexing also.

查看更多
登录 后发表回答