MySQL optimization query with subqueries

2019-06-01 03:32发布

问题:

Today i received email from my hosting account saying that i need to tweak my query:

SELECT
  `id`, `nick`, `msg`, `uid`, `show_pic`,
  `time`,`ip`,`time_updated`,
  (SELECT COUNT(c.msg_id)
   FROM `the_ans` c
   where c.msg_id = d.id) AS counter,
  (SELECT c.msg
   FROM `the_ans` c
   WHERE c.msg_id=d.id
   ORDER BY `time` DESC LIMIT 1) as lastmsg
FROM
  `the_data` d
ORDER BY `time_updated` DESC LIMIT 26340 ,15

EXPLAIN:

id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY d ALL 34309 Using filesort
3 DEPENDENT SUBQUERY c ALL 43659 Using where; Using filesort
2 DEPENDENT SUBQUERY c ALL 43659 Using where

This query examines 65,396,669,012,829 rows, which is unacceptable in shared hosting.

tbh, i don't understand their explanation.. what the query actually does is to get 15 posts order by time updated, for each post i grab the latest comment, count all comments for each post.

posts table - 'the_data'

comments table = 'the_ans'

i'm not a mysql guru and i don't know how to improve this query any help will be appreciated

thx

the query

SELECT
  `id` , `nick` , `msg` , `uid` , `show_pic` , `time` , `ip` , `time_updated` , (
    SELECT COUNT( c.msg_id )
    FROM `the_ans` c
    WHERE c.msg_id = d.id
   ) AS counter, (
    SELECT c.msg
    FROM `the_ans` c
    WHERE c.msg_id = d.id
    ORDER BY `time` DESC
    LIMIT 1
   ) AS lastmsg
FROM `the_data` d
ORDER BY `time_updated` DESC
LIMIT 26340 , 15 

this is the results structure

id| nick  | msg  | uid   | show_pick | time      | ip |time_updated|counter|lastmsg
  |       |      |       |           |           |    |            |       |
7 | jqman | hello| 10074 |   0       |2013-21-01 | 12 |2013-21-01  | 55    |blah bl

回答1:

A quick glance at the explain plan shows that there are no suitable indexes for MySQL to use, so it's resorting to full table scans.

 EXPLAIN: 
 id select_type        table type possible_keys key key_len ref rows  Extra 
 -- ------------------ ----- ---- ------------- --- ------- --- ----- ---------------------------- 
 1  PRIMARY            d     ALL                                34309 Using filesort
 3  DEPENDENT SUBQUERY c     ALL                                43659 Using where; Using filesort 
 2  DEPENDENT SUBQUERY c     ALL                                43659 Using where

To optimize the execution of the existing query, you need to add appropriate indexes. Likely candidates:

ON `the_data`(`time_updated`)
ON `the_ans`(`msg_id`,`time`)

Those indexes will significantly improve the performance of both the outer query (likely eliminating the sort operation), and the numerous executions of the correlated subqueries.


Beyond that, you're going to need to change the query to improve performance. The LIMIT clause on the outermost query is being applied after the entire resultset is prepared, which means those two correlated subqueries are getting executed for every row in table the_data. And that's going to eat your lunch, performance wise.

To have those correlated subqueries run only for the (up to) 15 rows that are being returned, you need to get that LIMIT clause applied before those subqueries get run.

This query should return an equivalent resultset, and will avoid 34,000+ executions of each correlated subquery, which should improve performance considerably:

SELECT d.*
     , ( SELECT COUNT( c.msg_id )
           FROM `the_ans` c
          WHERE c.msg_id = d.id
       ) AS counter
     , ( SELECT c.msg
           FROM `the_ans` c
          WHERE c.msg_id = d.id
          ORDER BY `time` DESC
          LIMIT 1
       ) AS lastmsg
  FROM ( SELECT e.`id` 
              , e.`nick`
              , e.`msg`
              , e.`uid`
              , e.`show_pic`
              , e.`time`
              , e.`ip`
              , e.`time_updated` 
           FROM `the_data` e
          ORDER
             BY e.`time_updated` DESC
          LIMIT 26340 , 15 
       ) d
 ORDER BY d.`time_updated` DESC

(Your current query executes each of those correlated subqueries "SELECT COUNT(1) FROM the_data" times. With the rewritten query above, each of those subqueries will be executed only 15 times.)



回答2:

Perform the correlated subqueries after selecting the time-limited rows from the main query:

SELECT d.*,
       (SELECT COUNT(c.msg_id)
        FROM `the_ans` c
        where c.msg_id = d.id) AS counter,
       (SELECT c.msg
        FROM `the_ans` c
        WHERE c.msg_id=d.id
        ORDER BY `time` DESC LIMIT 1) as lastmsg
FROM (SELECT
        `id`, `nick`, `msg`, `uid`, `show_pic`,
        `time`,`ip`,`time_updated`
      FROM
        `the_data`
      ORDER BY `time_updated` DESC LIMIT 26340 ,15) d

Also, make sure you have indexes on time_updated and msg_id.



回答3:

Something like this should give you the result a bit quicker.

Note that this is using INNER JOINs as it is intended to work when every record on *the_data* has at least one matching record on *the_ans*

SELECT `id` , `nick` , `msg` , `uid` , `show_pic` , `time` , `ip` , `time_updated` , Sub1.counter, c.msg AS lastmsg
FROM `the_data` d
INNER JOIN (SELECT msg_id, COUNT( * ) AS counter, MAX( `time` ) AS MaxTime FROM `the_ans` GROUP BY msg_id) Sub1 ON d.id = Sub1.msg_id
INNER JOIN the_ans c ON d.id = c.msg_id AND sub1.MaxTime = c.`time`
ORDER BY `time_updated` DESC
LIMIT 26340 , 15