How to filter SQL results in a has-many-through re

2018-12-31 00:30发布

Assuming I have the tables student, club, and student_club:

student {
    id
    name
}
club {
    id
    name
}
student_club {
    student_id
    club_id
}

I want to know how to find all students in both the soccer (30) and baseball (50) club.
While this query doesn't work, it's the closest thing I have so far:

SELECT student.*
FROM   student
INNER  JOIN student_club sc ON student.id = sc.student_id
LEFT   JOIN club c ON c.id = sc.club_id
WHERE  c.id = 30 AND c.id = 50

13条回答
高级女魔头
2楼-- · 2018-12-31 01:17

So there's more than one way to skin a cat.
I'll to add two more to make it, well, more complete.

1) GROUP first, JOIN later

Assuming a sane data model where (student_id, club_id) is unique in student_club. Martin Smith's second version is like somewhat similar, but he joins first, groups later. This should be faster:

SELECT s.id, s.name
  FROM student s
  JOIN (
   SELECT student_id
     FROM student_club
    WHERE club_id IN (30, 50)
    GROUP BY 1
   HAVING COUNT(*) > 1
       ) sc USING (student_id);

2) EXISTS

And of course, there is the classic EXISTS. Similar to Derek's variant with IN. Simple and fast. (In MySQL, this should be quite a bit faster than the variant with IN):

SELECT s.id, s.name
  FROM student s
 WHERE EXISTS (SELECT 1 FROM student_club
               WHERE  student_id = s.student_id AND club_id = 30)
   AND EXISTS (SELECT 1 FROM student_club
               WHERE  student_id = s.student_id AND club_id = 50);
查看更多
梦该遗忘
3楼-- · 2018-12-31 01:17

Since noone has added this (classic) version:

SELECT s.*
FROM student AS s
WHERE NOT EXISTS
      ( SELECT *
        FROM club AS c 
        WHERE c.id IN (30, 50)
          AND NOT EXISTS
              ( SELECT *
                FROM student_club AS sc 
                WHERE sc.student_id = s.id
                  AND sc.club_id = c.id  
              )
      )

or similar:

SELECT s.*
FROM student AS s
WHERE NOT EXISTS
      ( SELECT *
        FROM
          ( SELECT 30 AS club_id  
          UNION ALL
            SELECT 50
          ) AS c
        WHERE NOT EXISTS
              ( SELECT *
                FROM student_club AS sc 
                WHERE sc.student_id = s.id
                  AND sc.club_id = c.club_id  
              )
      )

One more try with a slightly different approach. Inspired by an article in Explain Extended: Multiple attributes in a EAV table: GROUP BY vs. NOT EXISTS:

SELECT s.*
FROM student_club AS sc
  JOIN student AS s
    ON s.student_id = sc.student_id
WHERE sc.club_id = 50                      --- one option here
  AND NOT EXISTS
      ( SELECT *
        FROM
          ( SELECT 30 AS club_id           --- all the rest in here
                                           --- as in previous query
          ) AS c
        WHERE NOT EXISTS
              ( SELECT *
                FROM student_club AS scc 
                WHERE scc.student_id = sc.id
                  AND scc.club_id = c.club_id  
              )
      )

Another approach:

SELECT s.stud_id
FROM   student s

EXCEPT

SELECT stud_id
FROM 
  ( SELECT s.stud_id, c.club_id
    FROM student s 
      CROSS JOIN (VALUES (30),(50)) c (club_id)
  EXCEPT
    SELECT stud_id, club_id
    FROM student_club
    WHERE club_id IN (30, 50)   -- optional. Not needed but may affect performance
  ) x ;   
查看更多
梦寄多情
4楼-- · 2018-12-31 01:19

@erwin-brandstetter Please, benchmark this:

SELECT s.stud_id, s.name
FROM   student s, student_club x, student_club y
WHERE  x.club_id = 30
AND    s.stud_id = x.stud_id
AND    y.club_id = 50
AND    s.stud_id = y.stud_id;

It's like number 6) by @sean , just cleaner, I guess.

查看更多
栀子花@的思念
5楼-- · 2018-12-31 01:20

Another CTE. It looks clean, but it will probably generate the same plan as a groupby in a normal subquery.

WITH two AS (
    SELECT student_id FROM tmp.student_club
    WHERE club_id IN (30,50)
    GROUP BY student_id
    HAVING COUNT(*) > 1
    )
SELECT st.* FROM tmp.student st
JOIN two ON (two.student_id=st.id)
    ;

For those who want to test, a copy of my generate testdata thingy:

DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp;

CREATE TABLE tmp.student
    ( id INTEGER NOT NULL PRIMARY KEY
    , sname VARCHAR
    );

CREATE TABLE tmp.club
    ( id INTEGER NOT NULL PRIMARY KEY
    , cname VARCHAR
    );

CREATE TABLE tmp.student_club
    ( student_id INTEGER NOT NULL  REFERENCES tmp.student(id)
    , club_id INTEGER NOT NULL  REFERENCES tmp.club(id)
    );

INSERT INTO tmp.student(id)
    SELECT generate_series(1,1000)
    ;

INSERT INTO tmp.club(id)
    SELECT generate_series(1,100)
    ;

INSERT INTO tmp.student_club(student_id,club_id)
    SELECT st.id  , cl.id
    FROM tmp.student st, tmp.club cl
    ;

DELETE FROM tmp.student_club
WHERE random() < 0.8
    ;

UPDATE tmp.student SET sname = 'Student#' || id::text ;
UPDATE tmp.club SET cname = 'Soccer' WHERE id = 30;
UPDATE tmp.club SET cname = 'Baseball' WHERE id = 50;

ALTER TABLE tmp.student_club
    ADD PRIMARY KEY (student_id,club_id)
    ;
查看更多
旧时光的记忆
6楼-- · 2018-12-31 01:21

If you just want student_id then:

    Select student_id
      from student_club
     where club_id in ( 30, 50 )
  group by student_id
    having count( student_id ) = 2

If you also need name from student then:

Select student_id, name
  from student s
 where exists( select *
                 from student_club sc
                where s.student_id = sc.student_id
                  and club_id in ( 30, 50 )
             group by sc.student_id
               having count( sc.student_id ) = 2 )

If you have more than two clubs in a club_selection table then:

Select student_id, name
  from student s
 where exists( select *
                 from student_club sc
                where s.student_id = sc.student_id
                  and exists( select * 
                                from club_selection cs
                               where sc.club_id = cs.club_id )
             group by sc.student_id
               having count( sc.student_id ) = ( select count( * )
                                                   from club_selection ) )
查看更多
笑指拈花
7楼-- · 2018-12-31 01:21
SELECT s.stud_id, s.name
FROM   student s,
(
select x.stud_id from 
student_club x 
JOIN   student_club y ON x.stud_id = y.stud_id
WHERE  x.club_id = 30
AND    y.club_id = 50
) tmp_tbl
where tmp_tbl.stud_id = s.stud_id
;

Use of fastest variant (Mr. Sean in Mr. Brandstetter chart). May be variant with only one join to only the student_club matrix has the right to live. So, the longest query will have only two columns to calculate, idea is to make the query thin.

查看更多
登录 后发表回答