How to optimise MySQL queries based on EXPLAIN pla

2019-01-17 11:15发布

问题:

Looking at a query's EXPLAIN plan, how does one determine where optimisations can best be made?

I appreciate that one of the first things to check is whether good indexes are being used, but beyond that I'm a little stumped. Through trial and error in the past I have sometimes found that the order in which joins are conducted can be a good source of improvement, but how can one determine that from looking at the execution plan?

Whilst I would very much like to gain a good general understanding of how to optimise queries (suggested reading much appreciated!), I also realise that it's often easier to discuss concrete cases than talk in the abstract. Since I am currently banging my head against the wall with this one, your thoughts would be much appreciated:

id   select_type   table   type     possible_keys    key       key_len   ref                    rows   Extra
 1   SIMPLE        S       const    PRIMARY,l,p,f4   PRIMARY         2   const                     1   Using temporary
 1   SIMPLE        Q       ref      PRIMARY,S        S               2   const                   204   Using index
 1   SIMPLE        V       ref      PRIMARY,n,Q      Q               5   const,db.Q.QID            6   Using where; Using index; Distinct
 1   SIMPLE        R1      ref      PRIMARY,L        L             154   const,db.V.VID          447   Using index; Distinct
 1   SIMPLE        W       eq_ref   PRIMARY,w        PRIMARY         5   const,db.R.RID,const      1   Using where; Distinct
 1   SIMPLE        R2      eq_ref   PRIMARY,L        PRIMARY       156   const,db.W.RID,const      1   Using where; Distinct

Am I correct in interpreting the final row of the execution plan as follows:

  • as it is fully matched on its primary key, only one row of R2 need be fetched per output row;
  • however, such output rows are then filtered based on some criteria that applies to R2?

If so, my problem lies in the filtering which occurs in that final step. If the condition results in no filtering (e.g. WHERE `Col_1_to_3` IN (1,2,3)), the query runs extremely quickly (~50ms); however, if the condition restricts the rows selected (WHERE `Col_1_to_3` IN (1,2)), the query takes considerably longer (~5s). If the restriction is to a single match (WHERE `Col_1_to_3` IN (1)), the optimiser suggests an altogether different execution plan (which performs marginally better than 5s, but still a lot worse than 50ms). It doesn't seem as though there's a better index that can be used on that table (given it's already fully using the primary key to return one row per result?).

How should one interpret all this information? Am I right in guessing that, because such output filtering is taking place on the final table to be joined, considerable effort is wasted versus joining the table earlier and filtering such rows sooner? If so, how does one determine when in the execution plan R2 ought to be joined?

Whilst I resisted including the query & schema in full here (as I would really likely to know what to look for, not merely be told the answer), I understand it's necessary to advance the discussion:

SELECT DISTINCT
    `Q`.`QID`
FROM
    `S`
    NATURAL JOIN `Q`
    NATURAL JOIN `V`
    NATURAL JOIN `R` AS `R1`
    NATURAL JOIN `W`

    JOIN `R` AS `R2` ON (
            `R2`.`SID` = `S`.`SID`
        AND `R2`.`RID` = `R1`.`RID`
        AND `R2`.`VID` = `S`.`V_id`
        AND `R2`.`Col_1_to_3` IN (1,2) -- this is where performance suffers!
    )

WHERE
    AND `S`.`SID` = @x
    AND `W`.`WID` = @y
;

The definition of table R is:

CREATE TABLE `R` (
  `SID` smallint(6) unsigned NOT NULL,
  `RID` smallint(6) unsigned NOT NULL,
  `VID` varchar(50) NOT NULL DEFAULT '',
  `Col_1_to_3` smallint(1) DEFAULT NULL,
  `T` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`SID`,`RID`,`VID`),
  KEY `L` (`SID`,`VID`,`Col_1_to_3`),
  CONSTRAINT `R_f1` FOREIGN KEY (`SID`) REFERENCES `S` (`SID`),
  CONSTRAINT `R_f2` FOREIGN KEY (`SID`, `VID`) REFERENCES `V` (`SID`, `VID`),
  CONSTRAINT `R_f3` FOREIGN KEY (`SID`, `VID`, `Col_1_to_3`) REFERENCES `L` (`SID`, `VID`, `LID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

回答1:

Depends on what you're going for and what the query is.

Generally, for every line in EXPLAIN that has a Using where, you need to have it using an index (possible keys and keys column). These are your filters and include WHERE and ON. Having it say Using index is even better. It means there's a covering index, and MySQL can retrieve the data right from the index rather than visiting the row in the table data.

The lines where there is no Using where, and it is returning a large number of rows should be looked at. These are returning values for all rows in the table. I don't know what your query is, so I don't know whether to be alarmed here. Try filtering the result set to reduce the size and improve performance.

You generally should try to avoid seeing Using filesort or Using temporary, though those are only bad if you're not expecting them.

Filesort usually appears with the ORDER clause. You generally want MySQL to use a covering index (Using index) so that the rows are returned already in order from the server. If they're not, then MySQL must order them afterward, using filesort.

Using temporary can be bad when it refers to derived tables because they don't have indexes. It seems that you've explicitly created a temporary table with indexes, so here, it's not bad. Sometimes, your only choice is to use a derived table, and hence Using temporary.