可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

In PostgreSQL 9.4 the window functions have the new option of a FILTER to select a sub-set of the window frame for processing. The documentation mentions it, but provides no sample. An online search yields some samples, including from 2ndQuadrant but all that I found were rather trivial examples with constant expressions. What I am looking for is a filter expression that includes the value of the current row.

Assume I have a table with a bunch of columns, one of which is of date type:

col1 | col2 |     dt
------------------------
  1  |  a   | 2015-07-01
  2  |  b   | 2015-07-03
  3  |  c   | 2015-07-10
  4  |  d   | 2015-07-11
  5  |  e   | 2015-07-11
  6  |  f   | 2015-07-13
...

A window definition for processing on the date over the entire table is trivially constructed: WINDOW win AS (ORDER BY dt)

I am interested in knowing how many rows are present in, say, the 4 days prior to the current row (inclusive). So I want to generate this output:

col1 | col2 |     dt     | count
--------------------------------
  1  |  a   | 2015-07-01 |   1
  2  |  b   | 2015-07-03 |   2
  3  |  c   | 2015-07-10 |   1
  4  |  d   | 2015-07-11 |   3
  5  |  e   | 2015-07-11 |   3
  6  |  f   | 2015-07-13 |   4
...

The FILTER clause of the window functions seems like the obvious choice:

count(*) FILTER (WHERE current_row.dt - dt <= 4) OVER win

But how do I specify current_row.dt (for lack of a better syntax)? Is this even possible?

If this is not possible, are there other ways of selecting date ranges in a window frame? The frame specification is no help as it is all row-based.

I am not interested in alternative solutions using sub-queries, it has to be based on window processing.

回答1:

You are not actually aggregating rows, so the new aggregate FILTER clause is not the right tool. A window function is more like it, a problem remains, however: the frame definition of a window cannot depend on values of the current row. It can only count a given number of rows preceding or following with the ROWS clause.

To make that work, aggregate counts per day and LEFT JOIN to a full set of days in range. Then you can apply a window function:

SELECT t.*, ct.ct_last4days
FROM  (
   SELECT *, sum(ct) OVER (ORDER BY dt ROWS 3 PRECEDING) AS ct_last4days
   FROM  (
      SELECT generate_series(min(dt), max(dt), interval '1 day')::date AS dt
      FROM   tbl t1
      ) d
   LEFT   JOIN (SELECT dt, count(*) AS ct FROM tbl GROUP BY 1) t USING (dt)
   ) ct
JOIN  tbl t USING (dt);

Omitting ORDER BY dt in the widow frame definition usually works, since the order is carried over from generate_series() in the subquery. But there are no guarantees in the SQL standard without explicit ORDER BY and it might break in more complex queries.

SQL Fiddle.

Select finishes where athlete didn't finish first for the past 3 events
PostgreSQL: running count of rows for a query 'by minute'
PostgreSQL unnest() with element number

回答2:

I don't think there is any syntax that means "current row" in an expression. The gram.y file for postgres makes a filter clause take just an a_expr, which is just the normal expression clauses. There is nothing specific to window functions or filter clauses in an expression. As far as I can find, the only current row notion in a window clause is for specifying the window frame boundaries. I don't think this gets you what you want.

It's possible that you could get some traction from an enclosing query:

http://www.postgresql.org/docs/current/static/sql-expressions.html

When an aggregate expression appears in a subquery (see Section 4.2.11 and Section 9.22), the aggregate is normally evaluated over the rows of the subquery. But an exception occurs if the aggregate's arguments (and filter_clause if any) contain only outer-level variables: the aggregate then belongs to the nearest such outer level, and is evaluated over the rows of that query.

but it's not obvious to me how.