I have a PostgreSQL database that stores users in a users
table and conversations they take part in a conversation
table. Since each user can take part in multiple conversations and each conversation can involve multiple users, I have a conversation_user
linking table to track which users are participating in each conversation:
# conversation_user
id | conversation_id | user_id
----+------------------+--------
1 | 1 | 32
2 | 1 | 3
3 | 2 | 32
4 | 2 | 3
5 | 2 | 4
In the above table, user 32 is having one conversation with just user 3 and another with both 3 and user 4. How would I write a query that would show that there is a conversation between just user 32 and user 3?
I've tried the following:
SELECT conversation_id AS cid,
user_id
FROM conversation_user
GROUP BY cid HAVING count(*) = 2
AND (user_id = 32
OR user_id = 3);
SELECT conversation_id AS cid,
user_id
FROM conversation_user
GROUP BY (cid HAVING count(*) = 2
AND (user_id = 32
OR user_id = 3));
SELECT conversation_id AS cid,
user_id
FROM conversation_user
WHERE (user_id = 32)
OR (user_id = 3)
GROUP BY cid HAVING count(*) = 2;
These queries throw an error that says that user_id must appear in the GROUP BY
clause or be used in an aggregate function. Putting them in an aggregate function (e.g. MIN
or MAX
) doesn't sound appropriate. I thought that my first two attempts were putting them in the GROUP BY
clause.
What am I doing wrong?
This is a case of relational division. We have assembled an arsenal of techniques under this related question:
- How to filter SQL results in a has-many-through relation
The special difficulty is to exclude additional users. There are basically 4 techniques.
- Select rows which are not present in other table
I suggest LEFT JOIN
/ IS NULL
:
SELECT cu1.conversation_id
FROM conversation_user cu1
JOIN conversation_user cu2 USING (conversation_id)
LEFT JOIN conversation_user cu3 ON cu3.conversation_id = cu1.conversation_id
AND cu3.user_id NOT IN (3,32)
WHERE cu1.user_id = 32
AND cu2.user_id = 3
AND cu3.conversation_id IS NULL;
Or NOT EXISTS
:
SELECT cu1.conversation_id
FROM conversation_user cu1
JOIN conversation_user cu2 USING (conversation_id)
WHERE cu1.user_id = 32
AND cu2.user_id = 3
AND NOT EXISTS (
SELECT 1
FROM conversation_user cu3
WHERE cu3.conversation_id = cu1.conversation_id
AND cu3.user_id NOT IN (3,32)
);
Both queries do not depend on a UNIQUE
constraint for (conversation_id, user_id)
, which may or may not be in place. Meaning, the query even works if user_id
32 (or 3) is listed more than once for the same conversation. You would get duplicate rows in the result, though, and need to apply DISTINCT
or GROUP BY
.
The only condition is the one you formulated:
... a query that would show that there is a conversation between just user 32 and user 3?
Audited query
The query you linked in the comment wouldn't work. You forgot to exclude other participants. Should be something like:
SELECT * -- or whatever you want to return
FROM conversation_user cu1
WHERE cu1.user_id = 32
AND EXISTS (
SELECT 1
FROM conversation_user cu2
WHERE cu2.conversation_id = cu1.conversation_id
AND cu2.user_id = 3
)
AND NOT EXISTS (
SELECT 1
FROM conversation_user cu3
WHERE cu3.conversation_id = cu1.conversation_id
AND cu3.user_id NOT IN (3,32)
);
Which is similar to the other two queries, except that it will not return multiple rows if user_id = 3
is linked multiple times.
You can use conditional aggregation to select all cids that only have 2 specific particpants
select cid from conversation_user
group by cid
having count(*) = 2
and count(case when user_id not in (32,3) then 1 end) = 0
If (cid,user_id)
is not unique then replace having count(*) = 2
with having count(distinct user_id) = 2
if you just want confirmation.
select conversation_id
from conversation_users
group by conversation_id
having bool_and ( user_id in (3,32))
and count(*) = 2;
if you want full details,
you can use a window function and a CTE like this:
with a as (
select *
,not bool_and( user_id in (3,32) )
over ( partition by conversation_id)
and 2 = count(user_id)
over ( partition by conversation_id)
as conv_candidates
from conversation_users
)
select * from a where conv_candidates;
Because you want conversations with just 2 users, you can use a self outer join on other users and filter out hits:
To find all 2-user conversations and they're between:
SELECT
a.conversation_id cid,
a.user_id user_id_1,
b.user_id user_id_2
FROM conversation_user a
JOIN conversation_user b ON b.cid = a.cid
AND b.user_id > a.user_id
LEFT JOIN conversation_user c ON c.cid = a.cid
AND c.user_id NOT IN (a.user_id, b.user_id)
WHERE c.cid IS NULL -- only return misses on join to others
To find all 2-user conversations for a particular user, just add:
AND a.user_id = 32