I have a post/tag database, with the usual post, tag, and tag_post tables. The tag_post table contains tagid and postid fields.
I need to query posts. When I want to fetch posts that have a certain tag, I have to use a join:
... INNER JOIN tag_post ON post.id = tag_post.postid
WHERE tag_post.tagid = {required_tagid}`
When I want to fetch posts that have tagIdA and tagIdB, I have to use two joins (which I kind of came to terms with eventually).
Now, I need to query posts that do not have a certain tag. Without much thought, I just changed the =
to !=
:
... INNER JOIN tag_post ON post.id = tag_post.postid
WHERE tag_post.tagid != {certain_tagid}`
Boom! Wrong logic!
I did come up with this - just writing the logic here:
... INNER JOIN tag_post ON post.id = tag_post.postid
WHERE tag_post.postid NOT IN
(SELECT postid from tag_post where tagid = {certain_tagid})
I know this will work, but due to the way I've been brought up, I feel guilty (justified or not) whenever I write a query with a subquery.
Suggest a better way to do this?
In addition to Gavin Towey's good answer, you can use a
not exists
subquery:The database typically executes both variants in the same way. I personally find the
not exists
approach easier to read.You can think of it as "find all rows in posts that do not have a match in tags (for a specific tag)"
This is the textbook use case for a LEFT JOIN.
Note that you have to have the tag id in the ON clause of the join.
For a reference on join types, see here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html
There are other ways.
One can obtain all the
id
of all posts that are tagged with bothtagid
123 and 456 by grouping filteringtag_post
for only those tags, grouping by post and then dropping any groups that contain fewer tags than expected; then one can use the result to filter theposts
table:If a post can be tagged with the same
tagid
multiple times, you will need to replaceCOUNT(*)
with the less performantCOUNT(DISTINCT tagid)
.This is known as an anti-join. The easiest way is to replace
IN
from the query above withNOT IN
, as you proposed. I wouldn't feel too guilty about it. The alternative is to use an outer join, as proposed in @GavinTowey's answer.