Select a post that does not have a particular tag

2019-06-10 00:35发布

I have a post/tag database, with the usual post, tag, and tag_post tables. The tag_post table contains tagid and postid fields.

I need to query posts. When I want to fetch posts that have a certain tag, I have to use a join:

... INNER JOIN tag_post ON post.id = tag_post.postid 
WHERE tag_post.tagid = {required_tagid}`

When I want to fetch posts that have tagIdA and tagIdB, I have to use two joins (which I kind of came to terms with eventually).

Now, I need to query posts that do not have a certain tag. Without much thought, I just changed the = to !=:

... INNER JOIN tag_post ON post.id = tag_post.postid 
WHERE tag_post.tagid != {certain_tagid}`

Boom! Wrong logic!

I did come up with this - just writing the logic here:

... INNER JOIN tag_post ON post.id = tag_post.postid 
WHERE tag_post.postid NOT IN 
(SELECT postid from tag_post where tagid = {certain_tagid})

I know this will work, but due to the way I've been brought up, I feel guilty (justified or not) whenever I write a query with a subquery.

Suggest a better way to do this?

3条回答
老娘就宠你
2楼-- · 2019-06-10 00:53

In addition to Gavin Towey's good answer, you can use a not exists subquery:

where   not exists
        (
        select  *
        from    tag_post
        where   post.id = tag_post.postid
                and tag_post.tagid = {required_tagid}
        )

The database typically executes both variants in the same way. I personally find the not exists approach easier to read.

查看更多
闹够了就滚
3楼-- · 2019-06-10 01:03

You can think of it as "find all rows in posts that do not have a match in tags (for a specific tag)"

This is the textbook use case for a LEFT JOIN.

LEFT JOIN tag_post ON post.id = tag_post.postid AND tag_post.tagid = {required_tagid}
WHERE tag_post.tag_id IS NULL

Note that you have to have the tag id in the ON clause of the join.

For a reference on join types, see here: http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html

查看更多
唯我独甜
4楼-- · 2019-06-10 01:10
  1. When I want to fetch posts that have tagIdA and tagIdB, I have to use two joins (which I kind of came to terms with eventually).

    There are other ways.

    One can obtain all the id of all posts that are tagged with both tagid 123 and 456 by grouping filtering tag_post for only those tags, grouping by post and then dropping any groups that contain fewer tags than expected; then one can use the result to filter the posts table:

    SELECT * FROM posts WHERE id IN (
      SELECT   postid
      FROM     tag_post
      WHERE    tagid IN (123,456)
      GROUP BY postid
      HAVING   COUNT(*) = 2
    )
    

    If a post can be tagged with the same tagid multiple times, you will need to replace COUNT(*) with the less performant COUNT(DISTINCT tagid).

  2. Now, I need to query posts that do not have a certain tag.

    This is known as an anti-join. The easiest way is to replace IN from the query above with NOT IN, as you proposed. I wouldn't feel too guilty about it. The alternative is to use an outer join, as proposed in @GavinTowey's answer.

查看更多
登录 后发表回答