Assuming I have the tables student
, club
, and student_club
:
student {
id
name
}
club {
id
name
}
student_club {
student_id
club_id
}
I want to know how to find all students in both the soccer (30) and baseball (50) club.
While this query doesn't work, it's the closest thing I have so far:
SELECT student.*
FROM student
INNER JOIN student_club sc ON student.id = sc.student_id
LEFT JOIN club c ON c.id = sc.club_id
WHERE c.id = 30 AND c.id = 50
I was curious. And as we all know, curiosity has a reputation for killing cats.
So, which is the fastest way to skin a cat?
The precise cat-skinning environment for this test:
student.id
isstudent.stud_id
andclub.id
isclub.club_id
here.Relevant indexes (should be the optimum - as long as we lack fore-knowledge which clubs will be queried):
club_pkey
is not required by most queries here.Primary keys implement unique indexes automatically In PostgreSQL.
The last index is to make up for this known shortcoming of multi-column indexes on PostgreSQL:
Results:
Total runtimes from EXPLAIN ANALYZE.
1) Martin 2: 44.594 ms
2) Erwin 1: 33.217 ms
3) Martin 1: 31.735 ms
4) Derek: 2.287 ms
5) Erwin 2: 2.181 ms
6) Sean: 2.043 ms
The last three perform pretty much the same. 4) and 5) result in the same query plan.
Late Additions:
Fancy SQL, but the performance can't keep up.
7) ypercube 1: 148.649 ms
8) ypercube 2: 147.497 ms
As expected, those two perform almost the same. Query plan results in table scans, the planner doesn't find a way to use the indexes here.
9) wildplasser 1: 49.849 ms
Fancy SQL, decent performance for a CTE. Very exotic query plan.
Again, would be interesting how 9.1 handles this. I am going to upgrade the db cluster used here to 9.1 soon. Maybe I'll rerun the whole shebang ...
10) wildplasser 2: 36.986 ms
CTE variant of query 2). Surprisingly, it can result in a slightly different query plan with the exact same data. I found a sequential scan on
student
, where the subquery-variant used the index.11) ypercube 3: 101.482 ms
Another late addition @ypercube. It is positively amazing, how many ways there are.
12) erwin 3: 2.377 ms
@ypercube's 11) is actually just the mind-twisting reverse approach of this simpler variant, that was also still missing. Performs almost as fast as the top cats.
13) erwin 4: 2.375 ms
Hard to believe, but here's another, genuinely new variant. I see potential for more than two memberships, but it also ranks among the top cats with just two.
Dynamic number of club memberships
In other words: varying number of filters. This question asked for exactly two club memberships. But many use cases have to prepare for a varying number.
Detailed discussion in this related later answer:
The query plan:
So it still seems to want the seq scan on student.
Different query plans in query 2) and 10)
I tested in a real life db, so the names differ from the catskin list. It's a backup copy, so nothing changed during all test runs (except minor changes to the catalogs).
Query 2)
Query 10)
This seems to perform reasonably well, since the CTE-scan avoids the need for two separate subqueries.
There is always a reason to misuse recursive queries!
(BTW: mysql does not seem to have recursive queries)
Or a more general solution easier to extend to
n
clubs and that avoidsINTERSECT
(not available in MySQL) andIN
(as performance of this sucks in MySQL)