Find rows that have same value in one column and o

2020-07-11 04:51发布

问题:

I have a PostgreSQL database that stores users in a users table and conversations they take part in a conversation table. Since each user can take part in multiple conversations and each conversation can involve multiple users, I have a conversation_user linking table to track which users are participating in each conversation:

# conversation_user
id  |  conversation_id | user_id
----+------------------+--------
1   |                1 |      32
2   |                1 |       3
3   |                2 |      32
4   |                2 |       3
5   |                2 |       4

In the above table, user 32 is having one conversation with just user 3 and another with both 3 and user 4. How would I write a query that would show that there is a conversation between just user 32 and user 3?

I've tried the following:

SELECT conversation_id AS cid,
       user_id
FROM conversation_user
GROUP BY cid HAVING count(*) = 2
AND (user_id = 32
     OR user_id = 3);

SELECT conversation_id AS cid,
   user_id
FROM conversation_user
GROUP BY (cid HAVING count(*) = 2
AND (user_id = 32
     OR user_id = 3));

SELECT conversation_id AS cid,
       user_id
FROM conversation_user
WHERE (user_id = 32)
  OR (user_id = 3)
GROUP BY cid HAVING count(*) = 2;

These queries throw an error that says that user_id must appear in the GROUP BY clause or be used in an aggregate function. Putting them in an aggregate function (e.g. MIN or MAX) doesn't sound appropriate. I thought that my first two attempts were putting them in the GROUP BY clause.

What am I doing wrong?

回答1:

This is a case of relational division. We have assembled an arsenal of techniques under this related question:

  • How to filter SQL results in a has-many-through relation

The special difficulty is to exclude additional users. There are basically 4 techniques.

  • Select rows which are not present in other table

I suggest LEFT JOIN / IS NULL:

SELECT cu1.conversation_id
FROM        conversation_user cu1
JOIN        conversation_user cu2 USING (conversation_id)
LEFT   JOIN conversation_user cu3 ON cu3.conversation_id = cu1.conversation_id
                                 AND cu3.user_id NOT IN (3,32)
WHERE  cu1.user_id = 32
AND    cu2.user_id = 3
AND    cu3.conversation_id IS NULL;

Or NOT EXISTS:

SELECT cu1.conversation_id
FROM   conversation_user cu1
JOIN   conversation_user cu2 USING (conversation_id)
WHERE  cu1.user_id = 32
AND    cu2.user_id = 3
AND NOT EXISTS (
   SELECT 1
   FROM   conversation_user cu3
   WHERE  cu3.conversation_id = cu1.conversation_id
   AND    cu3.user_id NOT IN (3,32)
   );

Both queries do not depend on a UNIQUE constraint for (conversation_id, user_id), which may or may not be in place. Meaning, the query even works if user_id 32 (or 3) is listed more than once for the same conversation. You would get duplicate rows in the result, though, and need to apply DISTINCT or GROUP BY.
The only condition is the one you formulated:

... a query that would show that there is a conversation between just user 32 and user 3?

Audited query

The query you linked in the comment wouldn't work. You forgot to exclude other participants. Should be something like:

SELECT *  -- or whatever you want to return
FROM   conversation_user cu1
WHERE  cu1.user_id = 32
AND    EXISTS (
   SELECT 1
   FROM   conversation_user cu2
   WHERE  cu2.conversation_id = cu1.conversation_id 
   AND    cu2.user_id = 3
   )
AND NOT EXISTS (
   SELECT 1
   FROM   conversation_user cu3
   WHERE  cu3.conversation_id = cu1.conversation_id
   AND    cu3.user_id NOT IN (3,32)
   );

Which is similar to the other two queries, except that it will not return multiple rows if user_id = 3 is linked multiple times.



回答2:

You can use conditional aggregation to select all cids that only have 2 specific particpants

select cid from conversation_user
group by cid
having count(*) = 2
and count(case when user_id not in (32,3) then 1 end) = 0

If (cid,user_id) is not unique then replace having count(*) = 2 with having count(distinct user_id) = 2



回答3:

if you just want confirmation.

 select conversation_id 
   from  conversation_users 
   group by conversation_id
   having bool_and ( user_id in (3,32))
      and count(*) = 2;

if you want full details, you can use a window function and a CTE like this:

 with a as (
   select *
      ,not bool_and( user_id in (3,32) )
         over  ( partition by conversation_id) 
       and 2 = count(user_id)
         over  ( partition by conversation_id)
           as conv_candidates 
   from conversation_users 
   ) 
 select * from a where conv_candidates;


回答4:

Because you want conversations with just 2 users, you can use a self outer join on other users and filter out hits:

To find all 2-user conversations and they're between:

SELECT
    a.conversation_id cid,
    a.user_id user_id_1,
    b.user_id user_id_2
FROM conversation_user a
JOIN conversation_user b ON b.cid = a.cid
  AND b.user_id > a.user_id
LEFT JOIN conversation_user c ON c.cid = a.cid
  AND c.user_id NOT IN (a.user_id, b.user_id)
WHERE c.cid IS NULL -- only return misses on join to others

To find all 2-user conversations for a particular user, just add:

AND a.user_id = 32