Looking at a query's EXPLAIN
plan, how does one determine where optimisations can best be made?
I appreciate that one of the first things to check is whether good indexes are being used, but beyond that I'm a little stumped. Through trial and error in the past I have sometimes found that the order in which joins are conducted can be a good source of improvement, but how can one determine that from looking at the execution plan?
Whilst I would very much like to gain a good general understanding of how to optimise queries (suggested reading much appreciated!), I also realise that it's often easier to discuss concrete cases than talk in the abstract. Since I am currently banging my head against the wall with this one, your thoughts would be much appreciated:
id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE S const PRIMARY,l,p,f4 PRIMARY 2 const 1 Using temporary 1 SIMPLE Q ref PRIMARY,S S 2 const 204 Using index 1 SIMPLE V ref PRIMARY,n,Q Q 5 const,db.Q.QID 6 Using where; Using index; Distinct 1 SIMPLE R1 ref PRIMARY,L L 154 const,db.V.VID 447 Using index; Distinct 1 SIMPLE W eq_ref PRIMARY,w PRIMARY 5 const,db.R.RID,const 1 Using where; Distinct 1 SIMPLE R2 eq_ref PRIMARY,L PRIMARY 156 const,db.W.RID,const 1 Using where; Distinct
Am I correct in interpreting the final row of the execution plan as follows:
- as it is fully matched on its primary key, only one row of
R2
need be fetched per output row; - however, such output rows are then filtered based on some criteria that applies to
R2
?
If so, my problem lies in the filtering which occurs in that final step. If the condition results in no filtering (e.g. WHERE `Col_1_to_3` IN (1,2,3)
), the query runs extremely quickly (~50ms); however, if the condition restricts the rows selected (WHERE `Col_1_to_3` IN (1,2)
), the query takes considerably longer (~5s). If the restriction is to a single match (WHERE `Col_1_to_3` IN (1)
), the optimiser suggests an altogether different execution plan (which performs marginally better than 5s, but still a lot worse than 50ms). It doesn't seem as though there's a better index that can be used on that table (given it's already fully using the primary key to return one row per result?).
How should one interpret all this information? Am I right in guessing that, because such output filtering is taking place on the final table to be joined, considerable effort is wasted versus joining the table earlier and filtering such rows sooner? If so, how does one determine when in the execution plan R2
ought to be joined?
Whilst I resisted including the query & schema in full here (as I would really likely to know what to look for, not merely be told the answer), I understand it's necessary to advance the discussion:
SELECT DISTINCT
`Q`.`QID`
FROM
`S`
NATURAL JOIN `Q`
NATURAL JOIN `V`
NATURAL JOIN `R` AS `R1`
NATURAL JOIN `W`
JOIN `R` AS `R2` ON (
`R2`.`SID` = `S`.`SID`
AND `R2`.`RID` = `R1`.`RID`
AND `R2`.`VID` = `S`.`V_id`
AND `R2`.`Col_1_to_3` IN (1,2) -- this is where performance suffers!
)
WHERE
AND `S`.`SID` = @x
AND `W`.`WID` = @y
;
The definition of table R
is:
CREATE TABLE `R` (
`SID` smallint(6) unsigned NOT NULL,
`RID` smallint(6) unsigned NOT NULL,
`VID` varchar(50) NOT NULL DEFAULT '',
`Col_1_to_3` smallint(1) DEFAULT NULL,
`T` varchar(255) DEFAULT NULL,
PRIMARY KEY (`SID`,`RID`,`VID`),
KEY `L` (`SID`,`VID`,`Col_1_to_3`),
CONSTRAINT `R_f1` FOREIGN KEY (`SID`) REFERENCES `S` (`SID`),
CONSTRAINT `R_f2` FOREIGN KEY (`SID`, `VID`) REFERENCES `V` (`SID`, `VID`),
CONSTRAINT `R_f3` FOREIGN KEY (`SID`, `VID`, `Col_1_to_3`) REFERENCES `L` (`SID`, `VID`, `LID`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8