Slow SQL query: using the same table in two differ

2020-06-25 04:44发布

问题:

Really hoping that some kind of performance whiz can explain to me why a single join is causing the query to become 10x slower. (Also, please don't laugh about the size of this query! I wanted to get out the whole catalog in my database to output with one query. I'm not sure if it would be faster to break it into smaller queries but that doesn't seem right.)

SELECT `c`.`categoryID`,
       `cl`.`name` AS `category_name`,
       `v`.*,
       TRUE AS `categoried`,
       GROUP_CONCAT(DISTINCT t_v.iso_3166_1_alpha_2) AS `video_territories`,
       GROUP_CONCAT(DISTINCT t_c.iso_3166_1_alpha_2) AS `category_territories`,
       `vl`.*,
       GROUP_CONCAT(DISTINCT kl.name) AS `keywords`
FROM `tblCategories` AS `c`
INNER JOIN `tblCategoryLocalisedData` AS `cl` ON c.categoryID = cl.categoryID
LEFT JOIN `tblCategoryDurations` AS `cd` ON c.categoryID = cd.categoryID
LEFT JOIN `tblCategoryRules` AS `cr` ON c.categoryID = cr.categoryID
LEFT JOIN `tblCategoryVideos` AS `cv` ON c.categoryID = cv.categoryID
LEFT JOIN `tblVideos` AS `v` ON cv.videoID = v.videoID
LEFT JOIN `tblVideoTerritories` AS `vt` ON vt.videoID = v.videoID
LEFT JOIN `tblCategoryTerritories` AS `ct` ON ct.categoryID = c.categoryID
INNER JOIN `tblTerritories` AS `t_v` ON t_v.territoryID = vt.territoryID
INNER JOIN `tblTerritories` AS `t_c` ON t_c.territoryID = ct.territoryID
INNER JOIN `tblVideoLocalisedData` AS `vl` ON vl.videoID = v.videoID
LEFT JOIN `tblVideoKeywords` AS `vk` ON v.videoID = vk.videoID
LEFT JOIN `tblKeywords` AS `k` ON vk.keywordID = k.keywordID
LEFT JOIN `tblKeywordLocalisedData` AS `kl` ON kl.keywordID = k.keywordID
INNER JOIN `tblLanguages` AS `l`
WHERE (cv.disabled IS NULL)
  AND (cd.start_date < NOW() OR cd.start_date IS NULL)
  AND (cd.end_date > NOW() OR cd.end_date IS NULL)
  AND (cr.name IS NULL)
  AND (l.languageID = cl.languageID OR cl.languageID IS NULL)
  AND (l.languageID = kl.languageID OR kl.languageID IS NULL)
  AND (l.languageID = vl.languageID OR vl.languageID IS NULL)
  AND (l.iso_639_1 = 'en')
GROUP BY `v`.`videoID`, `c`.`categoryID`
ORDER BY `c`.`categoryID` ASC

When I run the above query it takes 1 whole second to finish. I tried running an EXPLAIN on it and it gave me this:

+----+-------------+-------+--------+--------------------------------------------------------------------------------------+-----------------------------------------+---------+------------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys                                                                        | key                                     | key_len | ref                    | rows | Extra                                        |
+----+-------------+-------+--------+--------------------------------------------------------------------------------------+-----------------------------------------+---------+------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | cv    | ALL    | fk_tblCategoryVideos_tblCategories1,fk_tblCategoryVideos_tblVideos1                  | NULL                                    | NULL    | NULL                   |    2 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | c     | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.cv.categoryID  |    1 | Using index                                  |
|  1 | SIMPLE      | cd    | ref    | fk_tblCategoryDurations_tblCategories                                                | fk_tblCategoryDurations_tblCategories   | 4       | db.cv.categoryID  |    1 | Using where                                  |
|  1 | SIMPLE      | cr    | ref    | fk_tblCategoryRules_tblCategories1                                                   | fk_tblCategoryRules_tblCategories1      | 4       | db.cv.categoryID  |    1 | Using where; Not exists                      |
|  1 | SIMPLE      | vt    | ref    | fk_tblVideoTerritories_tblVideos1,fk_tblVideoTerritories_tblTerritories1             | fk_tblVideoTerritories_tblVideos1       | 4       | db.cv.videoID     |    1 | Using where                                  |
|  1 | SIMPLE      | t_v   | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.vt.territoryID |    1 |                                              |
|  1 | SIMPLE      | v     | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.vt.videoID     |    1 | Using where                                  |
|  1 | SIMPLE      | vk    | ref    | fk_tblVideoKeywords_tblVideos1                                                       | fk_tblVideoKeywords_tblVideos1          | 4       | db.cv.videoID     |    6 |                                              |
|  1 | SIMPLE      | k     | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.vk.keywordID   |    1 | Using index                                  |
|  1 | SIMPLE      | kl    | ref    | fk_tblKeywordLocalisedData_tblKeywords1                                              | fk_tblKeywordLocalisedData_tblKeywords1 | 4       | db.k.keywordID    |    1 |                                              |
|  1 | SIMPLE      | cl    | ALL    | fk_tblCategoryLocalisedData_tblCategories1,fk_tblCategoryLocalisedData_tblLanguages1 | NULL                                    | NULL    | NULL                   |    5 | Using where; Using join buffer               |
|  1 | SIMPLE      | l     | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.cl.languageID  |    1 | Using where                                  |
|  1 | SIMPLE      | ct    | ALL    | fk_tblCategoryTerritories_tblCategories1,fk_tblCategoryTerritories_tblTerritories1   | NULL                                    | NULL    | NULL                   |    2 | Using where; Using join buffer               |
|  1 | SIMPLE      | vl    | ALL    | fk_tblVideoLocalisedData_tblLanguages1,fk_tblVideoLocalisedData_tblVideos1           | NULL                                    | NULL    | NULL                   |    9 | Using where; Using join buffer               |
|  1 | SIMPLE      | t_c   | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.ct.territoryID |    1 |                                              |
+----+-------------+-------+--------+--------------------------------------------------------------------------------------+-----------------------------------------+---------+------------------------+------+----------------------------------------------+

However I have no idea what that means. How do I troubleshoot this? I do, thankfully, know which parts of the query cause the massive slow-down. If I remove the joins from either tblVideoTerritories (vt) to tblTerritories (t_v) or tblCategoryTerritories (ct) to tblTerritories (t_c) then everything speeds up considerably. I thought to begin with it might have been because of the GROUP_CONCAT or DISTINCT but I tried removing these and it made barely any change. It seems that the performance problem is caused by joining to the same table 'tblTerritories' twice. If I only have one of those joins the query only takes 0.1 seconds or 0.2 seconds to run -- this is still a long time but it's a better start!

What I want to know is how can I fix this performance problem? Why would joining to the same table twice cause the query to take 10x as long?!

Thanks for any help!

edit: A SHOW CREATE TABLE on tblVideoTerritories gives me this:

CREATE TABLE `tblVideoTerritories` (
  `videoTerritoryID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `videoID` int(10) unsigned NOT NULL,
  `territoryID` int(10) unsigned NOT NULL,
  PRIMARY KEY (`videoTerritoryID`),
  KEY `fk_tblVideoTerritories_tblVideos1` (`videoID`),
  KEY `fk_tblVideoTerritories_tblTerritories1` (`territoryID`),
  CONSTRAINT `fk_tblVideoTerritories_tblTerritories1` FOREIGN KEY (`territoryID`) REFERENCES `tblTerritories` (`territoryID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `fk_tblVideoTerritories_tblVideos1` FOREIGN KEY (`videoID`) REFERENCES `tblVideos` (`videoID`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

A SHOW CREATE TABLE on tblCategoryTerritories gives me this:

CREATE TABLE `tblCategoryTerritories` (
  `categoryTerritoryID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `categoryID` int(10) unsigned NOT NULL,
  `territoryID` int(10) unsigned NOT NULL,
  PRIMARY KEY (`categoryTerritoryID`),
  KEY `fk_tblCategoryTerritories_tblCategories1` (`categoryID`),
  KEY `fk_tblCategoryTerritories_tblTerritories1` (`territoryID`),
  CONSTRAINT `fk_tblCategoryTerritories_tblCategories1` FOREIGN KEY (`categoryID`) REFERENCES `tblCategories` (`categoryID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `fk_tblCategoryTerritories_tblTerritories1` FOREIGN KEY (`territoryID`) REFERENCES `tblTerritories` (`territoryID`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

A SHOW CREATE TABLE on tblTerritories gives me this:

CREATE TABLE `tblTerritories` (
  `territoryID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `iso_3166_1_alpha_2` char(2) COLLATE utf8_unicode_ci DEFAULT NULL,
  `iso_3166_1_alpha_3` char(3) COLLATE utf8_unicode_ci DEFAULT NULL,
  `defaultLanguageID` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`territoryID`),
  KEY `fk_tblTerritories_tblLanguages1` (`defaultLanguageID`),
  KEY `iso_3166_1_alpha_2` (`iso_3166_1_alpha_2`),
  CONSTRAINT `fk_tblTerritories_tblLanguages1` FOREIGN KEY (`defaultLanguageID`) REFERENCES `tblLanguages` (`languageID`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=8 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

edit2: The reason for joining twice to the same territory is that I need to generate two separate lists of territories using the GROUP_CONCAT at the top of the query. I need one for a video and one for the category it belongs to.

edit3: Interestingly if I cut down the query right down to its bare bones then it is very fast (0.00 seconds) even while joining to the same table twice:

    SELECT `c`.`categoryID`,
           `v`.`videoID`,
           GROUP_CONCAT(DISTINCT t_v.iso_3166_1_alpha_2) AS `video_territories`,
           GROUP_CONCAT(DISTINCT t_c.iso_3166_1_alpha_2) AS `category_territories`
FROM `tblCategories` AS `c`
LEFT JOIN `tblCategoryVideos` AS `cv` ON c.categoryID = cv.categoryID
LEFT JOIN `tblVideos` AS `v` ON cv.videoID = v.videoID
LEFT JOIN `tblVideoTerritories` AS `vt` ON vt.videoID = v.videoID
LEFT JOIN `tblCategoryTerritories` AS `ct` ON ct.categoryID = c.categoryID
INNER JOIN `tblTerritories` AS `t_v` ON t_v.territoryID = vt.territoryID
INNER JOIN `tblTerritories` AS `t_c` ON t_c.territoryID = ct.territoryID
GROUP BY `v`.`videoID`, `c`.`categoryID`

edit4: If I switch from using the WHERE as a makeshift ON then I still have a query which takes 0.98 seconds:

SELECT `c`.`categoryID`,
       `cl`.`name` AS `category_name`,
       `v`.*,
       TRUE AS `categoried`,
       GROUP_CONCAT(DISTINCT t_v.iso_3166_1_alpha_2) AS `video_territories`,
       GROUP_CONCAT(DISTINCT t_c.iso_3166_1_alpha_2) AS `category_territories`,
       `vl`.*,
       GROUP_CONCAT(DISTINCT kl.name) AS `keywords`
FROM `tblCategories` AS `c`
INNER JOIN `tblCategoryLocalisedData` AS `cl` ON c.categoryID = cl.categoryID
LEFT JOIN `tblCategoryDurations` AS `cd` ON c.categoryID = cd.categoryID
LEFT JOIN `tblCategoryRules` AS `cr` ON c.categoryID = cr.categoryID
LEFT JOIN `tblCategoryVideos` AS `cv` ON c.categoryID = cv.categoryID
LEFT JOIN `tblVideos` AS `v` ON cv.videoID = v.videoID
LEFT JOIN `tblVideoTerritories` AS `vt` ON vt.videoID = v.videoID
LEFT JOIN `tblCategoryTerritories` AS `ct` ON ct.categoryID = c.categoryID
INNER JOIN `tblTerritories` AS `t_v` ON t_v.territoryID = vt.territoryID
INNER JOIN `tblTerritories` AS `t_c` ON t_c.territoryID = ct.territoryID
INNER JOIN `tblVideoLocalisedData` AS `vl` ON vl.videoID = v.videoID
LEFT JOIN `tblVideoKeywords` AS `vk` ON v.videoID = vk.videoID
LEFT JOIN `tblKeywords` AS `k` ON vk.keywordID = k.keywordID
LEFT JOIN `tblKeywordLocalisedData` AS `kl` ON kl.keywordID = k.keywordID
INNER JOIN `tblLanguages` AS `l` ON (l.languageID = cl.languageID OR cl.languageID IS NULL) AND (l.languageID = kl.languageID OR kl.languageID IS NULL) AND (l.languageID = vl.languageID OR vl.languageID IS NULL)
WHERE (cv.disabled IS NULL)
  AND (cd.start_date < NOW() OR cd.start_date IS NULL)
  AND (cd.end_date > NOW() OR cd.end_date IS NULL)
  AND (cr.name IS NULL) AND (l.iso_639_1 = 'en')
GROUP BY `v`.`videoID`, `c`.`categoryID`
ORDER BY `c`.`categoryID` ASC

edit5: If I remove the keyword related joins the query happens in 0.09 seconds... Removing the tblKeyword and tblKeywordLocalisedData but leaving tblVideoKeywords gives me 0.80 seconds. Removing tblVideoKeywords gives me 0.09 seconds.

But it seems to have indexes so yet again I don't get it:

CREATE TABLE `tblVideoKeywords` (
  `videoKeywordID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `videoID` int(10) unsigned NOT NULL,
  `keywordID` int(10) unsigned NOT NULL,
  PRIMARY KEY (`videoKeywordID`),
  KEY `fk_tblVideoKeywords_tblVideos1` (`videoID`),
  KEY `fk_tblVideoKeywords_tblKeywords1` (`keywordID`),
  CONSTRAINT `fk_tblVideoKeywords_tblKeywords1` FOREIGN KEY (`keywordID`) REFERENCES `tblKeywords` (`keywordID`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `fk_tblVideoKeywords_tblVideos1` FOREIGN KEY (`videoID`) REFERENCES `tblVideos` (`videoID`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=18 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

edit6: Using the query provide by DRapp makes everything a lot faster. The EXPLAIN of his query now gives me:

+----+-------------+---------+--------+--------------------------------------------------------------------------------------+-----------------------------------------+---------+------------------------+------+----------------------------------------------+
| id | select_type | table   | type   | possible_keys                                                                        | key                                     | key_len | ref                    | rows | Extra                                        |
+----+-------------+---------+--------+--------------------------------------------------------------------------------------+-----------------------------------------+---------+------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | c       | index  | PRIMARY                                                                              | PRIMARY                                 | 4       | NULL                   |    3 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | cl      | ALL    | fk_tblCategoryLocalisedData_tblCategories1,fk_tblCategoryLocalisedData_tblLanguages1 | NULL                                    | NULL    | NULL                   |    5 | Using where; Using join buffer               |
|  1 | SIMPLE      | lang_cl | ALL    | PRIMARY                                                                              | NULL                                    | NULL    | NULL                   |    2 | Using where; Using join buffer               |
|  1 | SIMPLE      | cd      | ref    | fk_tblCategoryDurations_tblCategories                                                | fk_tblCategoryDurations_tblCategories   | 4       | db.c.categoryID   |    1 |                                              |
|  1 | SIMPLE      | cr      | ref    | fk_tblCategoryRules_tblCategories1                                                   | fk_tblCategoryRules_tblCategories1      | 4       | db.c.categoryID   |    1 | Using where; Not exists                      |
|  1 | SIMPLE      | cv      | ALL    | fk_tblCategoryVideos_tblCategories1,fk_tblCategoryVideos_tblVideos1                  | NULL                                    | NULL    | NULL                   |    2 | Using where; Using join buffer               |
|  1 | SIMPLE      | ct      | ALL    | fk_tblCategoryTerritories_tblCategories1,fk_tblCategoryTerritories_tblTerritories1   | NULL                                    | NULL    | NULL                   |    2 | Using where; Using join buffer               |
|  1 | SIMPLE      | t_c     | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.ct.territoryID |    1 |                                              |
|  1 | SIMPLE      | v       | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.cv.videoID     |    1 | Using where                                  |
|  1 | SIMPLE      | vt      | ref    | fk_tblVideoTerritories_tblVideos1,fk_tblVideoTerritories_tblTerritories1             | fk_tblVideoTerritories_tblVideos1       | 4       | db.v.videoID      |    1 | Using where                                  |
|  1 | SIMPLE      | t_v     | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.vt.territoryID |    1 |                                              |
|  1 | SIMPLE      | vl      | ALL    | fk_tblVideoLocalisedData_tblLanguages1,fk_tblVideoLocalisedData_tblVideos1           | NULL                                    | NULL    | NULL                   |    9 | Using where; Using join buffer               |
|  1 | SIMPLE      | lang_vl | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.vl.languageID  |    1 | Using where                                  |
|  1 | SIMPLE      | vk      | ALL    | fk_tblVideoKeywords_tblVideos1,fk_tblVideoKeywords_tblKeywords1                      | NULL                                    | NULL    | NULL                   |   15 | Using where; Using join buffer               |
|  1 | SIMPLE      | k       | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.vk.keywordID   |    1 | Using where; Using index                     |
|  1 | SIMPLE      | kl      | ref    | fk_tblKeywordLocalisedData_tblKeywords1,fk_tblKeywordLocalisedData_tblLanguages1     | fk_tblKeywordLocalisedData_tblKeywords1 | 4       | db.k.keywordID    |    1 | Using where                                  |
|  1 | SIMPLE      | lang_kl | eq_ref | PRIMARY                                                                              | PRIMARY                                 | 4       | db.kl.languageID  |    1 | Using where                                  |
+----+-------------+---------+--------+--------------------------------------------------------------------------------------+-----------------------------------------+---------+------------------------+------+----------------------------------------------+
17 rows in set (0.01 sec)

回答1:

Per several other questions I've answered similarly, just by adding a "STRAIGHT_JOIN" and slight restructuring can help. The query optimizer will actually try to think for you with respect to all the tables, try to find a table with less records and join that to the larger thus causing total chaos. This happened when I was doing gov't data querying of 14+ million records with lookups to 15+ sub-tables... Very similar to what you have going on here. It took a query that was running 30+ hours on a dedicated, stand-alone server and hung it, down to under 2 hours... Try the following:

In addition to some visual cleansing / ordering of the joins to what I am used to, I've also taken some of the NOW() vs NULL and moved them into the join. If you query a left join and have the dates as part of the join qualifier, you'll have those records out of bounds excluded and thus leave either a NULL resultset or a valid entry, no need to double-up that qualifier.

SELECT STRAIGHT_JOIN
      c.categoryID,
      cl.name AS category_name,
      v.*,
      TRUE AS categoried,
      GROUP_CONCAT(DISTINCT t_v.iso_3166_1_alpha_2) AS video_territories,
      GROUP_CONCAT(DISTINCT t_c.iso_3166_1_alpha_2) AS category_territories,
      vl.*,
      GROUP_CONCAT(DISTINCT kl.name) AS keywords
   FROM 
      tblCategories AS c
         INNER JOIN tblCategoryLocalisedData AS cl
            ON c.categoryID = cl.categoryID 
            INNER JOIN tblLanguages AS lang_cl
               ON l.languageID = lang_cl.languageID
                  AND lang_cl.iso_639_1 = 'en'
         LEFT JOIN tblCategoryDurations AS cd
            ON c.categoryID = cd.categoryID 
              AND cd.start_date < NOW()
              AND cd.end_date > NOW()
         LEFT JOIN tblCategoryRules AS cr
            ON c.categoryID = cr.categoryID 
         LEFT JOIN tblCategoryVideos AS cv
            ON c.categoryID = cv.categoryID 
         LEFT JOIN tblCategoryTerritories AS ct
            ON c.categoryID = ct.categoryID
            INNER JOIN tblTerritories AS t_c 
               ON ct.territoryID = t_c.territoryID
         LEFT JOIN tblVideos AS v
            ON cv.videoID = v.videoID 
            LEFT JOIN tblVideoTerritories AS vt
               ON v.videoID = vt.videoID
               INNER JOIN tblTerritories AS t_v
                  ON vt.territoryID = t_v.territoryID
            INNER JOIN tblVideoLocalisedData AS vl
               ON v.videoID = vl.videoID
               INNER JOIN tblLanguages AS lang_vl
                   ON vl.languageID = lang_vl.languageID
                      AND lang_vl.iso_639_1 = 'en'
            LEFT JOIN tblVideoKeywords AS vk
               ON v.videoID = vk.videoID 
               LEFT JOIN tblKeywords AS k
                  ON vk.keywordID = k.keywordID 
                  LEFT JOIN tblKeywordLocalisedData AS kl
                     ON k.keywordID = kl.keywordID
                     INNER JOIN tblLanguages AS lang_kl
                        ON kl.languageID = lang_kl.languageID
                          AND lang_kl.iso_639_1 = 'en'
   WHERE 
          (  cv.disabled IS NULL)   
      AND (  cr.name IS NULL)   
   GROUP BY 
      v.videoID, 
      c.categoryID
   ORDER BY 
      c.categoryID ASC 

STRAIGHT_JOIN as I explained above basically tells the optimizer, "dont think for me".... do the query in the order I've told you to. In this case, using "tblCategories" as the primary table and linking everything else off that. The optimizer, even with the explain may try be slow and try a different approach the next time you run the query. So, it could try to use the Languages table first and do back-paddling through the other tables and choke. Also, by having the "AND" portions such as the dates direct to those left joins, those joins simplify the WHERE as you see... as you had in the where for NULL or it existed, just applied to that particular join... keeps the where clean.

Additionally, by keeping the relationships direct and indented respective to what they were joining to, its easier to understand what is linked to where...

I would additionally like to see the final "EXPLAIN" and see what it comes up with.