Is there a way to select rows until some condition is met? I.e. a type of limit
, but not limited to N
rows, but to all the rows until the first non-matching row?
For example, say I have the table:
CREATE TABLE t (id SERIAL PRIMARY KEY, rank INTEGER, value INTEGER);
INSERT INTO t (rank, value) VALUES ( 1, 1), (2, 1), (2,2),(3,1);
that is:
test=# SELECT * FROM t;
id | rank | value
----+------+-------
1 | 1 | 1
2 | 2 | 1
3 | 2 | 2
4 | 3 | 1
(4 rows)
I want to order by rank, and select up until the first row that is over 1.
I.e. SELECT * FROM t ORDER BY rank UNTIL value>1
and I want the first 2 rows back?
One solution is to use a subquery and bool_or
:
SELECT * FROM
( SELECT id, rank, value, bool_and(value<2) OVER (order by rank, id) AS ok FROM t ORDER BY rank) t2
WHERE ok=true
BUT wont that end up going through all rows, even if I only want a handful?
(real world context: I have timestamped events in a table, I can use a window query lead/lag to select the time between two events, I want all event from now
going back as long as they happened less than 10 minutes apart – the lead/lag
window query complicates things, so simplified example here)
edit: made window-function order by rank, id
This may be no better than your solution, since you begged the question, "won't that end up going through all rows?"
I can tell you this -- the explain plan is different than your solution. I don't know how the guts of PostgreSQL works, but if I were writing a "max" function, I would think it would always be O(n). By contrast, you had an order by which is average case O(n log n), worst case O(n^2).
That said, I cannot deny that this will go through all rows:
select * from sandbox.t
where id < (select min (id) from sandbox.t where value > 1)
One thing to clarify, though, is that unless you scan all rows, I'm not sure how you could determine the minimum value. Any time you invoke an aggregate concept across all records, doesn't that mean that you must read all rows?
What you want is a sort of stop-condition. As far as I am aware there is no such thing in SQL, at least PostgreSQL's dialect.
What you can do is use a PL/PgSQL procedure to read rows from a cursor and return them until the stop condition is met. It won't be super fast, but it'll be alright. It's just a FOR
loop over a query with an IF expression THEN exit; ELSE return next; END IF;
. No explicit cursor is required because PL/PgSQL will use one internally if you FOR
loop over a query.
Another option is to create a cursor and read chunks of rows from it in the application, then discard part of the last chunk once the stop condition is met.
Either way, a cursor is going to be what you want.
A stop expression wouldn't actually be too hard to implement in PostgreSQL by the way. You'd have to implement a new executor node type, but the new CustomScan support would make that practical to do in an extension. Then you'd just evaluate an expression to decide whether or not to carry on fetching rows.
You can try something such as:
select * from t, (
select rank from t where value = 1 order by "rank" limit 1) x
where t.rank <= x.rank order by rank;
It will make two passes through the first part of the table (which you might be able to cut by creating an index on (rank, value = 1)) but shouldn't evaluate the rest of the table if you have an index on rank.
[If you could have window expressions in where clauses you could use a window expression to make sure any previous rows didn't have value = 1.. but even if this were possible, then getting the query evaluator to use to limit search would be yet another challenge.]