SQL window function with a where clause?

2019-04-22 08:27发布

问题:

I'm trying to correlate two types of events for users. I want to see all event "B"s along with the most recent event "A" for that user prior to the "A" event. How would one accomplish this? In particular, I'm trying to do this in Postgres.

I was hoping it was possible to use a "where" clause in a window function, in which case I could essentially do a LAG() with a "where event='A'", but that doesn't seem to be possible.

Any recommendations?

Data example:

|user |time|event|
|-----|----|-----|
|Alice|1   |A    |
|Bob  |2   |A    |
|Alice|3   |A    |
|Alice|4   |B    |
|Bob  |5   |B    |
|Alice|6   |B    |

Desired result:

|user |event_b_time|last_event_a_time|
|-----|------------|-----------------|
|Alice|4           |3                |
|Bob  |5           |2                |
|Alice|6           |3                |

回答1:

Just tried Gordon's approach using PostgreSQL 9.5.4, and it complained that

FILTER is not implemented for non-aggregate window functions

which means using lag() with FILTER is not allowed. So I modified Gordon's query using max(), a different window frame, and CTE:

WITH subq AS (
  SELECT
    "user", event, time as event_b_time,
    max(time) FILTER (WHERE event = 'A') OVER (
      PARTITION BY "user"
      ORDER BY time
      ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
    ) AS last_event_a_time
  FROM events
  ORDER BY time
)
SELECT
  "user", event_b_time, last_event_a_time
FROM subq
WHERE event = 'B';

Verified that this works with PostgreSQL 9.5.4.

Thanks to Gordon for the FILTER trick!



回答2:

Here is one method:

select t.*
from (select t.*,
             lag(time) filter (where event = 'A') (partition by user order by time)
      from t
     ) t
where event = 'B';

It is possible that the correlated subquery/lateral join would have better performance.



回答3:

There is not need for window functions here. Just find all B events, and for each one of them, find the most recent A of the same user via a subquery. Something like that should do it:

SELECT
    "user",
    time AS event_b_time,
    (SELECT time AS last_event_a_time
     FROM t t1
     WHERE "user"=t.user AND event='A' AND time<t.time
     ORDER BY time DESC LIMIT 1)
FROM t
WHERE event='B';

I assume that the table is called t (I used it twice).