I'm trying to correlate two types of events for users. I want to see all event "B"s along with the most recent event "A" for that user prior to the "A" event. How would one accomplish this? In particular, I'm trying to do this in Postgres.
I was hoping it was possible to use a "where" clause in a window function, in which case I could essentially do a LAG() with a "where event='A'", but that doesn't seem to be possible.
Any recommendations?
Data example:
|user |time|event|
|-----|----|-----|
|Alice|1 |A |
|Bob |2 |A |
|Alice|3 |A |
|Alice|4 |B |
|Bob |5 |B |
|Alice|6 |B |
Desired result:
|user |event_b_time|last_event_a_time|
|-----|------------|-----------------|
|Alice|4 |3 |
|Bob |5 |2 |
|Alice|6 |3 |
Just tried Gordon's approach using PostgreSQL 9.5.4, and it complained that
FILTER is not implemented for non-aggregate window functions
which means using lag()
with FILTER
is not allowed. So I modified Gordon's query using max()
, a different window frame, and CTE:
WITH subq AS (
SELECT
"user", event, time as event_b_time,
max(time) FILTER (WHERE event = 'A') OVER (
PARTITION BY "user"
ORDER BY time
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS last_event_a_time
FROM events
ORDER BY time
)
SELECT
"user", event_b_time, last_event_a_time
FROM subq
WHERE event = 'B';
Verified that this works with PostgreSQL 9.5.4.
Thanks to Gordon for the FILTER
trick!
Here is one method:
select t.*
from (select t.*,
lag(time) filter (where event = 'A') (partition by user order by time)
from t
) t
where event = 'B';
It is possible that the correlated subquery/lateral join would have better performance.
There is not need for window functions here. Just find all B
events, and for each one of them, find the most recent A
of the same user via a subquery. Something like that should do it:
SELECT
"user",
time AS event_b_time,
(SELECT time AS last_event_a_time
FROM t t1
WHERE "user"=t.user AND event='A' AND time<t.time
ORDER BY time DESC LIMIT 1)
FROM t
WHERE event='B';
I assume that the table is called t
(I used it twice).