Optimizing mysql query (likes/dislikes)

2019-07-27 02:18发布

My website contains pieces of content on which users can vote (like/dislike similar to reddit upvotes). When selecting an individual piece of content, I run the following subqueries to get the number of likes, the number of dislikes and the current user's vote.

The votes are stored in a separate table {contentId, userId, vote}

SELECT

[... BUNCH OF FIELDS ...]

(SELECT COUNT(*) FROM votes vt WHERE vt.cId = c.contentId AND vote = '.Constants::LIKE.') AS likes,
(SELECT COUNT(*) FROM votes vt WHERE vt.cId = c.contentId AND vote = '.Constants::DISLIKE.') AS dislikes,
COALESCE((SELECT vote FROM votes vt WHERE vt.cId = c.contentId AND userId = '.USER_ID.'), '.Constants::NO_VOTE.') AS myVote

FROM content

[... OTHER STUFF ... ]

Is there a better way to achieve this (combine those subqueries or otherwise)?

2条回答
贼婆χ
2楼-- · 2019-07-27 02:33

Your problem is simple your current sub queries are running for every single row returned.

You need to join to that data instead. You will need to change the code I've added to give you the correct counts but this should point you in the right direction.

SELECT
    BLAH
    Likes,
    Dislikes
FROM CONTENT as C
INNER JOIN (
    SELECT 
        cID,
        COUNT(votes) as Likes, --you will need to alter this 
        COUNT(votes) as Dislikes --to count your up and downvotes
    FROM Votes
    GROUP BY cID
    ) AS V
    ON  V.cID = C.ContentID
查看更多
相关推荐>>
3楼-- · 2019-07-27 02:44

In terms of performance, those correlated subqueries can eat your lunch. And devour your lunchbox too, for large sets, because of the way MySQL processes them. Each of those subqueries gets executed for every row returned in the outer query. And that can get very expensive for large sets.

An alternative approach is to use an inline view to materialize the likes and dislikes for all content, and then do a join operation to that.

But, this approach can be expensive too, particularly when you are only needing the vote "counts" for just a few content rows, out of a bazillion rows. Often, there is a predicate from the outer query that can also be incorporated into the inline view, to limit the number of rows that need to be examined and returned.

We want to use an OUTER join to that inline view, so it returns a result equivalent to your query; returning a row from content when there are no matching rows in the vote table.

SELECT [... BUNCH OF FIELDS ...]
     , COALESCE(v.likes,0) AS likes
     , COALESCE(v.dislikes,0) AS dislikes
     , COALESCE(v.myvote,'.Constants::NO_VOTE.') AS myvote
  FROM content c
  LEFT
  JOIN ( SELECT vt.cId
              , SUM(vt.vote = '.Constants::LIKE.') AS likes
              , SUM(vt.vote = '.Constants::DISLIKE.') AS dislikes
              , MAX(IF(vt.userId = '.USER_ID.',vt.vote,NULL)) AS myvote
           FROM votes vt
          GROUP
             BY vt.cId
       ) v
    ON v.cId = c.contentId

       [... OTHER STUFF ... ]

Note that the inline view query (aliased as v) is going to look at EVERY single row from the votes table. If you only need a subset, then consider adding an appropriate predicate (either in a WHERE clause or as a JOIN to another table). There's no indication from the [... OTHER STUFF ...] in your query whether it's returning just a few rows from content or if you are needing all of the rows because you are ordering by likes, etc.

For a small number of rows selected from the content table, using the correlated subqueries (like in your query) can actually be faster than materializing a huge inline view and performing a join operation against it.

Oh... and for both queries, it goes without saying that an appropriate index on the votes table with a leading column of cId will benefit performance. For the inline view, you don't want the overhead of MySQL having to perform a filesort operation on all of those rows to do the GROUP BY. And for the correlated subqueries, you want them to use a index range scan, not a full scan.

查看更多
登录 后发表回答