In PostgreSQL 9.4 the window functions have the new option of a FILTER
to select a sub-set of the window frame for processing. The documentation mentions it, but provides no sample. An online search yields some samples, including from 2ndQuadrant but all that I found were rather trivial examples with constant expressions. What I am looking for is a filter expression that includes the value of the current row.
Assume I have a table with a bunch of columns, one of which is of date
type:
col1 | col2 | dt
------------------------
1 | a | 2015-07-01
2 | b | 2015-07-03
3 | c | 2015-07-10
4 | d | 2015-07-11
5 | e | 2015-07-11
6 | f | 2015-07-13
...
A window definition for processing on the date
over the entire table is trivially constructed: WINDOW win AS (ORDER BY dt)
I am interested in knowing how many rows are present in, say, the 4 days prior to the current row (inclusive). So I want to generate this output:
col1 | col2 | dt | count
--------------------------------
1 | a | 2015-07-01 | 1
2 | b | 2015-07-03 | 2
3 | c | 2015-07-10 | 1
4 | d | 2015-07-11 | 3
5 | e | 2015-07-11 | 3
6 | f | 2015-07-13 | 4
...
The FILTER
clause of the window functions seems like the obvious choice:
count(*) FILTER (WHERE current_row.dt - dt <= 4) OVER win
But how do I specify current_row.dt
(for lack of a better syntax)? Is this even possible?
If this is not possible, are there other ways of selecting date
ranges in a window frame? The frame specification is no help as it is all row-based.
I am not interested in alternative solutions using sub-queries, it has to be based on window processing.
You are not actually aggregating rows, so the new aggregate FILTER
clause is not the right tool. A window function is more like it, a problem remains, however: the frame definition of a window cannot depend on values of the current row. It can only count a given number of rows preceding or following with the ROWS
clause.
To make that work, aggregate counts per day and LEFT JOIN
to a full set of days in range. Then you can apply a window function:
SELECT t.*, ct.ct_last4days
FROM (
SELECT *, sum(ct) OVER (ORDER BY dt ROWS 3 PRECEDING) AS ct_last4days
FROM (
SELECT generate_series(min(dt), max(dt), interval '1 day')::date AS dt
FROM tbl t1
) d
LEFT JOIN (SELECT dt, count(*) AS ct FROM tbl GROUP BY 1) t USING (dt)
) ct
JOIN tbl t USING (dt);
Omitting ORDER BY dt
in the widow frame definition usually works, since the order is carried over from generate_series()
in the subquery. But there are no guarantees in the SQL standard without explicit ORDER BY
and it might break in more complex queries.
SQL Fiddle.
Related:
- Select finishes where athlete didn't finish first for the past 3 events
- PostgreSQL: running count of rows for a query 'by minute'
- PostgreSQL unnest() with element number
I don't think there is any syntax that means "current row" in an expression. The gram.y file for postgres makes a filter clause
take just an a_expr, which is just the normal expression clauses. There
is nothing specific to window functions or filter clauses in an expression.
As far as I can find, the only current row notion in a window clause is for specifying the window frame boundaries. I don't think this gets you
what you want.
It's possible that you could get some traction from an enclosing query:
http://www.postgresql.org/docs/current/static/sql-expressions.html
When an aggregate expression appears in a subquery (see Section 4.2.11
and Section 9.22), the aggregate is normally evaluated over the rows
of the subquery. But an exception occurs if the aggregate's arguments
(and filter_clause if any) contain only outer-level variables: the
aggregate then belongs to the nearest such outer level, and is
evaluated over the rows of that query.
but it's not obvious to me how.