I have 3 tables. Below is the structure:
student
(id int, name varchar(20)
)
course
(course_id int, subject varchar(10)
)
student_course
(st_id int, course_id int
) -> contains name of students who enrolled for a course
Now, I want to write a query to find out students who did not enroll for any course. As I could figure out there are multiple ways to fetching this information. Could you please let me know which one of these is the most efficient and also, why. Also, if there could be any other better way of executing same, please let me know.
db2 => select distinct name from student inner join student_course on id not in (select st_id from student_course)
db2 => select name from student minus (select name from student inner join student_course on id=st_id)
db2 => select name from student where id not in (select st_id from student_course)
Thanks in advance!!
The subqueries you use, whether it is not in
, minus
or whatever, are generally inefficient. Common way to do this is left join
:
select name
from student
left join student_course on id = st_id
where st_id is NULL
Using join
is "normal" and preffered solution.
The canonical (maybe even synoptic) idiom is (IMHO) to use NOT EXISTS
:
SELECT *
FROM student st
WHERE NOT EXISTS (
SELECT *
FROM student_course
WHERE st.id = nx.st_id
);
Advantages:
NOT EXISTS(...)
is very old, and most optimisers will know how to handle it
- , thus it will probably be present on all platforms
- the
nx.
correlation name is not leaked into the outer query: the select *
in the outer query will only yield fields from the student
table, and not the (null) rows from the student_course
table, like in the LEFT JOIN ... WHERE ... IS NULL
case. This is especially useful in queries with a large number of range table entries.
(NOT) IN
is error prone (NULLs), and it might perform bad on some implementations (duplicates and NULLs have to be removed from the result of the uncorrelated subquery)
Using "not in" is generally slow. That makes your second query the most efficient. You probably don't need the brackets though.
Just as a comment: I would suggest to select student Id (which are unique) and not names.
As another query option you might want to join the two tables, group by student_id, count(course_id) having count(course_id) = 0.
Also, I agree that indexes will be more important.