I have a table like this:
Name activity time
user1 A1 12:00
user1 E3 12:01
user1 A2 12:02
user2 A1 10:05
user2 A2 10:06
user2 A3 10:07
user2 M6 10:07
user2 B1 10:08
user3 A1 14:15
user3 B2 14:20
user3 D1 14:25
user3 D2 14:30
Now, I need a result like this:
Name activity next_activity
user1 A2 NULL
user2 A3 B1
user3 A1 B2
I would like to check for every user the last activity from group A and what type of activity took place next from group B (activity from group B always takes place after activity from group A). Other types of activity are not interesting for me. I've tried to use the lead()
function, but it hasn't worked.
How I can solve my problem?
Test setup:
CREATE TEMP TABLE t (name text, activity text, time time);
INSERT INTO t values
('user1', 'A1', '12:00')
,('user1', 'E3', '12:01')
,('user1', 'A2', '12:02')
,('user2', 'A1', '10:05')
,('user2', 'A2', '10:06')
,('user2', 'A3', '10:07')
,('user2', 'M6', '10:07')
,('user2', 'B1', '10:08')
,('user3', 'A1', '14:15')
,('user3', 'B2', '14:20')
,('user3', 'D1', '14:25')
,('user3', 'D2', '14:30');
Your definition:
activity from group B always takes place after activity from group A.
.. logically implies that there is, per user, 0 or 1 B activity after 1 or more A activities. Never more than 1 B activities in sequence.
You can make it work with a single window function, DISTINCT ON
and CASE
, which should be the fastest way for few rows per user (also see below):
SELECT name
, CASE WHEN a2 LIKE 'B%' THEN a1 ELSE a2 END AS activity
, CASE WHEN a2 LIKE 'B%' THEN a2 END AS next_activity
FROM (
SELECT DISTINCT ON (name)
name
, lead(activity) OVER (PARTITION BY name ORDER BY time DESC) AS a1
, activity AS a2
FROM t
WHERE (activity LIKE 'A%' OR activity LIKE 'B%')
ORDER BY name, time DESC
) sub;
An SQL CASE
expression defaults to NULL
if no ELSE
branch is added, so I kept that short.
Also assuming time
is defined NOT NULL
. Else, you might want to add NULLS LAST
. Why?
- Select first row in each GROUP BY group?
(activity LIKE 'A%' OR activity LIKE 'B%')
is more verbose than activity ~ '^[AB]'
, but typically faster in older versions of Postgres. About pattern matching:
- Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
Conditional window functions?
That's actually possible. You can combine the aggregate FILTER
clause with the OVER
clause of window functions. However:
The FILTER
clause itself can only work with values from the current row.
More importantly, FILTER
is not implemented for pure window functions like lead()
or lag()
in Postgres 9.6 (yet) - only for aggregate functions.
If you try:
lead(activity) FILTER (WHERE activity LIKE 'A%') OVER () AS activity
Postgres will tell you:
FILTER is not implemented for non-aggregate window functions
About FILTER
:
- How can I simplify this game statistics query?
- Referencing current row in FILTER clause of window function
Performance
(For few users with few rows per user, pretty much any query is fast, even without index.)
For many users and few rows per user, the first query above should be fastest. See the linked answer above about index and performance.
For many rows per user, there are (potentially much) faster techniques, depending on other details of your setup:
- Optimize GROUP BY query to retrieve latest record per user
select distinct on(name) name,activity,next_activity
from (select name,activity,time
,lead(activity) over (partition by name order by time) as next_activity
from t
where left(activity,1) in ('A','B')
) t
where left(activity,1) = 'A'
order by name,time desc