MySQL. WordPress. Slow query when using IN stateme

2019-08-04 17:20发布

问题:

I'm trying to figure out a better way to write the following query that is generated by WordPress's WP_Query class. Right now it is very slow.

SELECT SQL_CALC_FOUND_ROWS wp_posts.*
FROM wp_posts
INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id)
INNER JOIN wp_postmeta AS mt1 ON (wp_posts.ID = mt1.post_id)
INNER JOIN wp_postmeta AS mt2 ON (wp_posts.ID = mt2.post_id)
INNER JOIN wp_postmeta AS mt3 ON (wp_posts.ID = mt3.post_id)
INNER JOIN wp_postmeta AS mt4 ON (wp_posts.ID = mt4.post_id)
INNER JOIN wp_postmeta AS mt5 ON (wp_posts.ID = mt5.post_id)
INNER JOIN wp_postmeta AS mt6 ON (wp_posts.ID = mt6.post_id)
INNER JOIN wp_postmeta AS mt7 ON (wp_posts.ID = mt7.post_id)
INNER JOIN wp_postmeta AS mt8 ON (wp_posts.ID = mt8.post_id)

WHERE 1=1 AND wp_posts.post_type = 'gemstone'
AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'future' OR wp_posts.post_status = 'draft' OR wp_posts.post_status = 'pending' OR wp_posts.post_status = 'private')
AND (wp_postmeta.meta_key = 'gemstone_active_price'
    AND (mt1.meta_key = 'gemstone_status' AND CAST(mt1.meta_value AS CHAR) = 'Available')
    AND (mt2.meta_key = 'gemstone_length' AND CAST(mt2.meta_value AS DECIMAL(10,2)) BETWEEN '0' AND '9')
    AND (mt3.meta_key = 'gemstone_width' AND CAST(mt3.meta_value AS DECIMAL(10,2)) BETWEEN '0' AND '9')
    AND (mt4.meta_key = 'gemstone_depth' AND CAST(mt4.meta_value AS DECIMAL(10,2)) BETWEEN '0' AND '7')
    AND (mt5.meta_key = 'gemstone_color' AND CAST(mt5.meta_value AS CHAR) IN ('L','K','J','I','H','G','F','E','D'))
    AND (mt6.meta_key = 'gemstone_clarity' AND CAST(mt6.meta_value AS CHAR) IN ('I3','I2','I1','SI2','SI1','VS2','VVS2','VVS1','IF','FL'))
    AND (mt7.meta_key = 'gemstone_weight' AND CAST(mt7.meta_value AS DECIMAL(10,2)) BETWEEN '0.67' AND '1.85')
    AND (mt8.meta_key = 'gemstone_active_price' AND CAST(mt8.meta_value AS DECIMAL(10,2)) BETWEEN '960' AND '2300')
)

GROUP BY wp_posts.ID
ORDER BY wp_postmeta.meta_value+0 ASC
LIMIT 0, 20

I know it looks like a big mess, but the whole thing executes very quickly when I don't have the 2 IN statements in the WHERE clause (mt5 and mt6 above). The problem is, I don't know SQL well enough to figure out another way to write the query that avoids using the IN statements. Any ideas?

UPDATE: Here is the EXPLAIN output for this query in case it will help anyone. If anyone has any other thoughts, I'm open to anything. This has me completely stumped.

id  select_type     table           type    possible_keys               key         key_len     ref                         rows    Extra
1   SIMPLE          wp_postmeta     ref     post_id,meta_key            meta_key    768         const                       2       Using where; Using temporary; Using filesort
1   SIMPLE          mt1             ref     post_id,meta_key            post_id     8           db.wp_postmeta.post_id      2       Using where
1   SIMPLE          mt2             ref     post_id,meta_key            post_id     8           db.mt1.post_id              2       Using where
1   SIMPLE          mt3             ref     post_id,meta_key            post_id     8           db.wp_postmeta.post_id      2       Using where
1   SIMPLE          mt4             ref     post_id,meta_key            post_id     8           db.mt2.post_id              2       Using where
1   SIMPLE          mt5             ref     post_id,meta_key            post_id     8           db.wp_postmeta.post_id      2       Using where
1   SIMPLE          mt6             ref     post_id,meta_key            post_id     8           db.wp_postmeta.post_id      2       Using where
1   SIMPLE          mt7             ref     post_id,meta_key            post_id     8           db.mt3.post_id              2       Using where
1   SIMPLE          mt8             ref     post_id,meta_key            post_id     8           db.wp_postmeta.post_id      2       Using where
1   SIMPLE          wp_posts        eq_ref  PRIMARY,type_status_date    PRIMARY     8           db.wp_postmeta.post_id      1       Using where

UPDATE 2: After some more experimentation, I have realized that it's not just the IN() statements that are slowing this query down. It seems that any combination of more than one IN() with more than 3 BETWEEN...AND... statements has a dramatic impact on performance.

For example, the query executes in about 0.04s if I drop the last 2 AND clauses (versus 4.9s with them), or it executes in 0.04s if I drop the 2 AND clauses with the IN() statements. This leads me to think that a 2 query solution might be best, but I have no idea how to implement that via the WordPress WP_Query API, and if I did, I wonder if that would even be quicker than just doing one query and then filtering the results via PHP.

I hate the idea of filtering with PHP because I've read in several places that filtering should be left to the database because that's what databases are good at. By the way, if it makes any difference, I'm running these queries on a WordPress 3.3.1 installation on my localhost WAMP server on a computer with plenty of processing power (Intel i7, 12 GB RAM, etc.).

UPDATE 3: I was thinking about just giving up and removing all the IN() clauses from the query and filter for those via PHP, but that has some serious draw backs. Besides being inefficient and a code smell, it won't allow me to control my pagination correctly. When everything is filtered in the database, I can simply use the LIMIT clause to handle pagination. When I filter with PHP, I have no idea how many results will be returned for any given offset. So, all the filtering really needs to be done in the database, the question is how. Does anyone have any additional suggestions at all for me? Would any additional information be helpful to anyone?

UPDATE 4: In my search for a solution to this, I posted it as an issue in the WordPress core trac system (http://core.trac.wordpress.org/ticket/20134). One of the devs there suggested I try to use taxonomies instead of meta data for anything that I use IN for in my meta query. I took that advice, and I saw a performance improvement, but unfortunately, it wasn't nearly enough. The old query took 4+ seconds to run, and using taxonomies it got down to 1+ seconds. However, I realized that I actually need 4 IN type clauses (not the original 2). With the 2 additional taxonomy clauses, the query takes 18+ seconds to execute. So, I'm back to square one. One thought I had (that's probably delusional) is that this could be running so slow because I have so few posts that meet the criteria. For testing purposes, I only have 3 posts in the database that have the 'gemstone' post type. Could that have anything to do with it?

If anyone is interested, my new SQL looks like this:

SELECT SQL_CALC_FOUND_ROWS wp_posts.*
FROM wp_posts
INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id)
INNER JOIN wp_term_relationships AS tt1 ON (wp_posts.ID = tt1.object_id)
INNER JOIN wp_term_relationships AS tt2 ON (wp_posts.ID = tt2.object_id)
INNER JOIN wp_term_relationships AS tt3 ON (wp_posts.ID = tt3.object_id)    
INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id)
INNER JOIN wp_postmeta AS mt1 ON (wp_posts.ID = mt1.post_id)
INNER JOIN wp_postmeta AS mt2 ON (wp_posts.ID = mt2.post_id)
INNER JOIN wp_postmeta AS mt3 ON (wp_posts.ID = mt3.post_id)
INNER JOIN wp_postmeta AS mt4 ON (wp_posts.ID = mt4.post_id)
INNER JOIN wp_postmeta AS mt5 ON (wp_posts.ID = mt5.post_id)
INNER JOIN wp_postmeta AS mt6 ON (wp_posts.ID = mt6.post_id)

WHERE 1=1
AND ( wp_term_relationships.term_taxonomy_id IN (71,72,73,74)
    AND tt1.term_taxonomy_id IN (89,90,91,92,93,95,96,97)
    AND tt2.term_taxonomy_id IN (56,50,104,53)
    AND tt3.term_taxonomy_id IN (59,60,62)
)
AND wp_posts.post_type = 'gemstone'
AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'future' OR wp_posts.post_status = 'draft' OR wp_posts.post_status = 'pending' OR wp_posts.post_status = 'private')
AND (wp_postmeta.meta_key = 'gemstone_weight'
    AND (mt1.meta_key = 'gemstone_status' AND CAST(mt1.meta_value AS CHAR) = 'Available')
    AND (mt2.meta_key = 'gemstone_length' AND CAST(mt2.meta_value AS DECIMAL(8,2)) BETWEEN '0' AND '9')
    AND (mt3.meta_key = 'gemstone_width' AND CAST(mt3.meta_value AS DECIMAL(8,2)) BETWEEN '0' AND '9' )
    AND (mt4.meta_key = 'gemstone_depth' AND CAST(mt4.meta_value AS DECIMAL(8,2)) BETWEEN '0' AND '7')
    AND (mt5.meta_key = 'gemstone_weight' AND CAST(mt5.meta_value AS DECIMAL(8,2)) BETWEEN '0.81' AND '1.81')
    AND (mt6.meta_key = 'gemstone_active_price' AND CAST(mt6.meta_value AS DECIMAL(8,2)) BETWEEN '1083.9' AND '2078.26')
)

GROUP BY wp_posts.ID
ORDER BY wp_postmeta.meta_value+0 ASC
LIMIT 0, 20

and the new EXPLAIN output is as follows:

id  select_type     table                   type    possible_keys               key         key_len     ref                     rows    Extra
1   SIMPLE          wp_postmeta             ref     post_id,meta_key            meta_key    768         const                   3       Using where; Using temporary; Using filesort
1   SIMPLE          tt3                     ref     PRIMARY,term_taxonomy_id    PRIMARY     8           db.wp_postmeta.post_id  1       Using where; Using index
1   SIMPLE          tt2                     ref     PRIMARY,term_taxonomy_id    PRIMARY     8           db.wp_postmeta.post_id  1       Using where; Using index
1   SIMPLE          wp_term_relationships   ref     PRIMARY,term_taxonomy_id    PRIMARY     8           db.tt2.object_id        1       Using where; Using index
1   SIMPLE          wp_posts                eq_ref  PRIMARY,type_status_date    PRIMARY     8           db.wp_postmeta.post_id  1       Using where
1   SIMPLE          tt1                     ref     PRIMARY,term_taxonomy_id    PRIMARY     8           db.wp_posts.ID          1       Using where; Using index
1   SIMPLE          mt5                     ref     post_id,meta_key            post_id     8           db.wp_posts.ID          2       Using where
1   SIMPLE          mt6                     ref     post_id,meta_key            post_id     8           db.wp_posts.ID          2       Using where
1   SIMPLE          mt1                     ref     post_id,meta_key            post_id     8           db.mt5.post_id          2       Using where
1   SIMPLE          mt2                     ref     post_id,meta_key            post_id     8           db.mt1.post_id          2       Using where
1   SIMPLE          mt3                     ref     post_id,meta_key            post_id     8           db.tt2.object_id        2       Using where
1   SIMPLE          mt4                     ref     post_id,meta_key            post_id     8           db.tt3.object_id        2       Using where

UPDATE 5: Because of a comment, I recently took another stab at optimizing this query, but I was left with the conclusion that the SQL pretty much has to be setup the way it is. However, in testing some alternatives, I found that strangely the query runs much faster now. I haven't updated my MySQL server, so the only reason I can fathom is that WordPress updated their database structure in someway to improve performance. The exact same query shown in update 4 now takes approximately 2.4 seconds. Still far too long in my opinion (so I'm still using STRAIGHT_JOIN as shown in my answer below), but I was surprised by the improvement, and it makes me wonder if a different solution is out there that can optimize this even more. Here is the new EXPLAIN output. It looks almost identical to me, but I don't really know how to interpret it.

+-----+--------------+------------------------+---------+---------------------------+-----------+----------+-------------------------------------+-------+----------------------------------------------+
| id  | select_type  |         table          |  type   |      possible_keys        |   key     | key_len  |                ref                  | rows  |                    Extra                     |
+-----+--------------+------------------------+---------+---------------------------+-----------+----------+-------------------------------------+-------+----------------------------------------------+
|  1  | SIMPLE       | wp_postmeta            | ref     | post_id,meta_key          | meta_key  |     768  | const                               |    5  | Using where; Using temporary; Using filesort |
|  1  | SIMPLE       | wp_term_relationships  | ref     | PRIMARY,term_taxonomy_id  | PRIMARY   |       8  | db.wp_postmeta.post_id              |    1  | Using where; Using index                     |
|  1  | SIMPLE       | tt2                    | ref     | PRIMARY,term_taxonomy_id  | PRIMARY   |       8  | db.wp_term_relationships.object_id  |    1  | Using where; Using index                     |
|  1  | SIMPLE       | tt3                    | ref     | PRIMARY,term_taxonomy_id  | PRIMARY   |       8  | db.wp_term_relationships.object_id  |    1  | Using where; Using index                     |
|  1  | SIMPLE       | wp_posts               | eq_ref  | PRIMARY,type_status_date  | PRIMARY   |       8  | db.wp_postmeta.post_id              |    1  | Using where                                  |
|  1  | SIMPLE       | tt1                    | ref     | PRIMARY,term_taxonomy_id  | PRIMARY   |       8  | db.wp_posts.ID                      |    1  | Using where; Using index                     |
|  1  | SIMPLE       | mt3                    | ref     | post_id,meta_key          | post_id   |       8  | db.tt2.object_id                    |    3  | Using where                                  |
|  1  | SIMPLE       | mt4                    | ref     | post_id,meta_key          | post_id   |       8  | db.tt3.object_id                    |    3  | Using where                                  |
|  1  | SIMPLE       | mt5                    | ref     | post_id,meta_key          | post_id   |       8  | db.wp_posts.ID                      |    3  | Using where                                  |
|  1  | SIMPLE       | mt6                    | ref     | post_id,meta_key          | post_id   |       8  | db.wp_posts.ID                      |    3  | Using where                                  |
|  1  | SIMPLE       | mt1                    | ref     | post_id,meta_key          | post_id   |       8  | db.mt5.post_id                      |    3  | Using where                                  |
|  1  | SIMPLE       | mt2                    | ref     | post_id,meta_key          | post_id   |       8  | db.mt3.post_id                      |    3  | Using where                                  |
+-----+--------------+------------------------+---------+---------------------------+-----------+----------+-------------------------------------+-------+----------------------------------------------+

回答1:

The "solution" I've stumbled on for now is pretty ugly, but for some inexplicable reason, it works. Adding the STRAIGHT_JOIN optimizer hint dropped the execution time from 18+ seconds to approximately 0.0022 seconds. Based on common sense and this question (When to use STRAIGHT_JOIN with MySQL), this solution seems like a bad idea, but it's the only thing I've tried that has worked. So, for now at least, I'm sticking to it. If anyone has any thoughts on why I shouldn't do this, or what I should try instead, I would love to hear them.

If anyone is curious, I implemented it as a WordPress filter like so:

function use_straight_join( $distinct_clause ) {

    $distinct_clause = ( $use_straight_join ) ? 'STRAIGHT_JOIN' . $distinct_clause : $distinct_clause;

    return $distinct_clause;
}
add_filter( 'posts_distinct', 'use_straight_join' );

And for completeness, here is the EXPLAIN output for the query when using STRAIGHT_JOIN. Again, I'm baffled. The old query used only ref and eq_ref which I understand to be faster than range, but this is orders of magnitude faster for some reason.

+-----+--------------+------------------------+--------+---------------------------+-------------------+----------+-----------------+-------+----------------------------------------------+
| id  | select_type  |         table          | type   |      possible_keys        |       key         | key_len  |      ref        | rows  |                    Extra                     |
+-----+--------------+------------------------+--------+---------------------------+-------------------+----------+-----------------+-------+----------------------------------------------+
|  1  | SIMPLE       | wp_posts               | range  | PRIMARY,type_status_date  | type_status_date  |     124  | NULL            |    6  | Using where; Using temporary; Using filesort |
|  1  | SIMPLE       | wp_postmeta            | ref    | post_id,meta_key          | post_id           |       8  | db.wp_posts.ID  |    2  | Using where                                  |
|  1  | SIMPLE       | mt1                    | ref    | post_id,meta_key          | post_id           |       8  | db.wp_posts.ID  |    2  | Using where                                  |
|  1  | SIMPLE       | mt2                    | ref    | post_id,meta_key          | post_id           |       8  | db.wp_posts.ID  |    2  | Using where                                  |
|  1  | SIMPLE       | mt3                    | ref    | post_id,meta_key          | post_id           |       8  | db.wp_posts.ID  |    2  | Using where                                  |
|  1  | SIMPLE       | mt4                    | ref    | post_id,meta_key          | post_id           |       8  | db.wp_posts.ID  |    2  | Using where                                  |
|  1  | SIMPLE       | mt5                    | ref    | post_id,meta_key          | post_id           |       8  | db.mt3.post_id  |    2  | Using where                                  |
|  1  | SIMPLE       | mt6                    | ref    | post_id,meta_key          | post_id           |       8  | db.wp_posts.ID  |    2  | Using where                                  |
|  1  | SIMPLE       | wp_term_relationships  | ref    | PRIMARY,term_taxonomy_id  | PRIMARY           |       8  | db.wp_posts.ID  |    1  | Using where; Using index                     |
|  1  | SIMPLE       | tt1                    | ref    | PRIMARY,term_taxonomy_id  | PRIMARY           |       8  | db.wp_posts.ID  |    1  | Using where; Using index                     |
|  1  | SIMPLE       | tt2                    | ref    | PRIMARY,term_taxonomy_id  | PRIMARY           |       8  | db.mt1.post_id  |    1  | Using where; Using index                     |
|  1  | SIMPLE       | tt3                    | ref    | PRIMARY,term_taxonomy_id  | PRIMARY           |       8  | db.wp_posts.ID  |    1  | Using where; Using index                     |
+-----+--------------+------------------------+--------+---------------------------+-------------------+----------+-----------------+-------+----------------------------------------------+


回答2:

Can you check if the "meta_value" field has an index? This could make a huge speed improvement.

I'm not sure about it, but maybe the CAST() functions cause some slowdown? Are they necessary?