I'm trying to figure out a better way to write the following query that is generated by WordPress's WP_Query
class. Right now it is very slow.
SELECT SQL_CALC_FOUND_ROWS wp_posts.*
FROM wp_posts
INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id)
INNER JOIN wp_postmeta AS mt1 ON (wp_posts.ID = mt1.post_id)
INNER JOIN wp_postmeta AS mt2 ON (wp_posts.ID = mt2.post_id)
INNER JOIN wp_postmeta AS mt3 ON (wp_posts.ID = mt3.post_id)
INNER JOIN wp_postmeta AS mt4 ON (wp_posts.ID = mt4.post_id)
INNER JOIN wp_postmeta AS mt5 ON (wp_posts.ID = mt5.post_id)
INNER JOIN wp_postmeta AS mt6 ON (wp_posts.ID = mt6.post_id)
INNER JOIN wp_postmeta AS mt7 ON (wp_posts.ID = mt7.post_id)
INNER JOIN wp_postmeta AS mt8 ON (wp_posts.ID = mt8.post_id)
WHERE 1=1 AND wp_posts.post_type = 'gemstone'
AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'future' OR wp_posts.post_status = 'draft' OR wp_posts.post_status = 'pending' OR wp_posts.post_status = 'private')
AND (wp_postmeta.meta_key = 'gemstone_active_price'
AND (mt1.meta_key = 'gemstone_status' AND CAST(mt1.meta_value AS CHAR) = 'Available')
AND (mt2.meta_key = 'gemstone_length' AND CAST(mt2.meta_value AS DECIMAL(10,2)) BETWEEN '0' AND '9')
AND (mt3.meta_key = 'gemstone_width' AND CAST(mt3.meta_value AS DECIMAL(10,2)) BETWEEN '0' AND '9')
AND (mt4.meta_key = 'gemstone_depth' AND CAST(mt4.meta_value AS DECIMAL(10,2)) BETWEEN '0' AND '7')
AND (mt5.meta_key = 'gemstone_color' AND CAST(mt5.meta_value AS CHAR) IN ('L','K','J','I','H','G','F','E','D'))
AND (mt6.meta_key = 'gemstone_clarity' AND CAST(mt6.meta_value AS CHAR) IN ('I3','I2','I1','SI2','SI1','VS2','VVS2','VVS1','IF','FL'))
AND (mt7.meta_key = 'gemstone_weight' AND CAST(mt7.meta_value AS DECIMAL(10,2)) BETWEEN '0.67' AND '1.85')
AND (mt8.meta_key = 'gemstone_active_price' AND CAST(mt8.meta_value AS DECIMAL(10,2)) BETWEEN '960' AND '2300')
)
GROUP BY wp_posts.ID
ORDER BY wp_postmeta.meta_value+0 ASC
LIMIT 0, 20
I know it looks like a big mess, but the whole thing executes very quickly when I don't have the 2 IN
statements in the WHERE clause (mt5 and mt6 above). The problem is, I don't know SQL well enough to figure out another way to write the query that avoids using the IN
statements. Any ideas?
UPDATE:
Here is the EXPLAIN
output for this query in case it will help anyone. If anyone has any other thoughts, I'm open to anything. This has me completely stumped.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE wp_postmeta ref post_id,meta_key meta_key 768 const 2 Using where; Using temporary; Using filesort
1 SIMPLE mt1 ref post_id,meta_key post_id 8 db.wp_postmeta.post_id 2 Using where
1 SIMPLE mt2 ref post_id,meta_key post_id 8 db.mt1.post_id 2 Using where
1 SIMPLE mt3 ref post_id,meta_key post_id 8 db.wp_postmeta.post_id 2 Using where
1 SIMPLE mt4 ref post_id,meta_key post_id 8 db.mt2.post_id 2 Using where
1 SIMPLE mt5 ref post_id,meta_key post_id 8 db.wp_postmeta.post_id 2 Using where
1 SIMPLE mt6 ref post_id,meta_key post_id 8 db.wp_postmeta.post_id 2 Using where
1 SIMPLE mt7 ref post_id,meta_key post_id 8 db.mt3.post_id 2 Using where
1 SIMPLE mt8 ref post_id,meta_key post_id 8 db.wp_postmeta.post_id 2 Using where
1 SIMPLE wp_posts eq_ref PRIMARY,type_status_date PRIMARY 8 db.wp_postmeta.post_id 1 Using where
UPDATE 2:
After some more experimentation, I have realized that it's not just the IN()
statements that are slowing this query down. It seems that any combination of more than one IN()
with more than 3 BETWEEN...AND...
statements has a dramatic impact on performance.
For example, the query executes in about 0.04s if I drop the last 2 AND
clauses (versus 4.9s with them), or it executes in 0.04s if I drop the 2 AND
clauses with the IN()
statements. This leads me to think that a 2 query solution might be best, but I have no idea how to implement that via the WordPress WP_Query
API, and if I did, I wonder if that would even be quicker than just doing one query and then filtering the results via PHP.
I hate the idea of filtering with PHP because I've read in several places that filtering should be left to the database because that's what databases are good at. By the way, if it makes any difference, I'm running these queries on a WordPress 3.3.1 installation on my localhost
WAMP server on a computer with plenty of processing power (Intel i7, 12 GB RAM, etc.).
UPDATE 3:
I was thinking about just giving up and removing all the IN()
clauses from the query and filter for those via PHP, but that has some serious draw backs. Besides being inefficient and a code smell, it won't allow me to control my pagination correctly. When everything is filtered in the database, I can simply use the LIMIT
clause to handle pagination. When I filter with PHP, I have no idea how many results will be returned for any given offset. So, all the filtering really needs to be done in the database, the question is how. Does anyone have any additional suggestions at all for me? Would any additional information be helpful to anyone?
UPDATE 4:
In my search for a solution to this, I posted it as an issue in the WordPress core trac system (http://core.trac.wordpress.org/ticket/20134). One of the devs there suggested I try to use taxonomies instead of meta data for anything that I use IN
for in my meta query. I took that advice, and I saw a performance improvement, but unfortunately, it wasn't nearly enough. The old query took 4+ seconds to run, and using taxonomies it got down to 1+ seconds. However, I realized that I actually need 4 IN
type clauses (not the original 2). With the 2 additional taxonomy clauses, the query takes 18+ seconds to execute. So, I'm back to square one. One thought I had (that's probably delusional) is that this could be running so slow because I have so few posts that meet the criteria. For testing purposes, I only have 3 posts in the database that have the 'gemstone'
post type. Could that have anything to do with it?
If anyone is interested, my new SQL looks like this:
SELECT SQL_CALC_FOUND_ROWS wp_posts.*
FROM wp_posts
INNER JOIN wp_term_relationships ON (wp_posts.ID = wp_term_relationships.object_id)
INNER JOIN wp_term_relationships AS tt1 ON (wp_posts.ID = tt1.object_id)
INNER JOIN wp_term_relationships AS tt2 ON (wp_posts.ID = tt2.object_id)
INNER JOIN wp_term_relationships AS tt3 ON (wp_posts.ID = tt3.object_id)
INNER JOIN wp_postmeta ON (wp_posts.ID = wp_postmeta.post_id)
INNER JOIN wp_postmeta AS mt1 ON (wp_posts.ID = mt1.post_id)
INNER JOIN wp_postmeta AS mt2 ON (wp_posts.ID = mt2.post_id)
INNER JOIN wp_postmeta AS mt3 ON (wp_posts.ID = mt3.post_id)
INNER JOIN wp_postmeta AS mt4 ON (wp_posts.ID = mt4.post_id)
INNER JOIN wp_postmeta AS mt5 ON (wp_posts.ID = mt5.post_id)
INNER JOIN wp_postmeta AS mt6 ON (wp_posts.ID = mt6.post_id)
WHERE 1=1
AND ( wp_term_relationships.term_taxonomy_id IN (71,72,73,74)
AND tt1.term_taxonomy_id IN (89,90,91,92,93,95,96,97)
AND tt2.term_taxonomy_id IN (56,50,104,53)
AND tt3.term_taxonomy_id IN (59,60,62)
)
AND wp_posts.post_type = 'gemstone'
AND (wp_posts.post_status = 'publish' OR wp_posts.post_status = 'future' OR wp_posts.post_status = 'draft' OR wp_posts.post_status = 'pending' OR wp_posts.post_status = 'private')
AND (wp_postmeta.meta_key = 'gemstone_weight'
AND (mt1.meta_key = 'gemstone_status' AND CAST(mt1.meta_value AS CHAR) = 'Available')
AND (mt2.meta_key = 'gemstone_length' AND CAST(mt2.meta_value AS DECIMAL(8,2)) BETWEEN '0' AND '9')
AND (mt3.meta_key = 'gemstone_width' AND CAST(mt3.meta_value AS DECIMAL(8,2)) BETWEEN '0' AND '9' )
AND (mt4.meta_key = 'gemstone_depth' AND CAST(mt4.meta_value AS DECIMAL(8,2)) BETWEEN '0' AND '7')
AND (mt5.meta_key = 'gemstone_weight' AND CAST(mt5.meta_value AS DECIMAL(8,2)) BETWEEN '0.81' AND '1.81')
AND (mt6.meta_key = 'gemstone_active_price' AND CAST(mt6.meta_value AS DECIMAL(8,2)) BETWEEN '1083.9' AND '2078.26')
)
GROUP BY wp_posts.ID
ORDER BY wp_postmeta.meta_value+0 ASC
LIMIT 0, 20
and the new EXPLAIN
output is as follows:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE wp_postmeta ref post_id,meta_key meta_key 768 const 3 Using where; Using temporary; Using filesort
1 SIMPLE tt3 ref PRIMARY,term_taxonomy_id PRIMARY 8 db.wp_postmeta.post_id 1 Using where; Using index
1 SIMPLE tt2 ref PRIMARY,term_taxonomy_id PRIMARY 8 db.wp_postmeta.post_id 1 Using where; Using index
1 SIMPLE wp_term_relationships ref PRIMARY,term_taxonomy_id PRIMARY 8 db.tt2.object_id 1 Using where; Using index
1 SIMPLE wp_posts eq_ref PRIMARY,type_status_date PRIMARY 8 db.wp_postmeta.post_id 1 Using where
1 SIMPLE tt1 ref PRIMARY,term_taxonomy_id PRIMARY 8 db.wp_posts.ID 1 Using where; Using index
1 SIMPLE mt5 ref post_id,meta_key post_id 8 db.wp_posts.ID 2 Using where
1 SIMPLE mt6 ref post_id,meta_key post_id 8 db.wp_posts.ID 2 Using where
1 SIMPLE mt1 ref post_id,meta_key post_id 8 db.mt5.post_id 2 Using where
1 SIMPLE mt2 ref post_id,meta_key post_id 8 db.mt1.post_id 2 Using where
1 SIMPLE mt3 ref post_id,meta_key post_id 8 db.tt2.object_id 2 Using where
1 SIMPLE mt4 ref post_id,meta_key post_id 8 db.tt3.object_id 2 Using where
UPDATE 5: Because of a comment, I recently took another stab at optimizing this query, but I was left with the conclusion that the SQL pretty much has to be setup the way it is. However, in testing some alternatives, I found that strangely the query runs much faster now. I haven't updated my MySQL server, so the only reason I can fathom is that WordPress updated their database structure in someway to improve performance. The exact same query shown in update 4 now takes approximately 2.4 seconds. Still far too long in my opinion (so I'm still using STRAIGHT_JOIN as shown in my answer below), but I was surprised by the improvement, and it makes me wonder if a different solution is out there that can optimize this even more. Here is the new EXPLAIN output. It looks almost identical to me, but I don't really know how to interpret it.
+-----+--------------+------------------------+---------+---------------------------+-----------+----------+-------------------------------------+-------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+-----+--------------+------------------------+---------+---------------------------+-----------+----------+-------------------------------------+-------+----------------------------------------------+
| 1 | SIMPLE | wp_postmeta | ref | post_id,meta_key | meta_key | 768 | const | 5 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | wp_term_relationships | ref | PRIMARY,term_taxonomy_id | PRIMARY | 8 | db.wp_postmeta.post_id | 1 | Using where; Using index |
| 1 | SIMPLE | tt2 | ref | PRIMARY,term_taxonomy_id | PRIMARY | 8 | db.wp_term_relationships.object_id | 1 | Using where; Using index |
| 1 | SIMPLE | tt3 | ref | PRIMARY,term_taxonomy_id | PRIMARY | 8 | db.wp_term_relationships.object_id | 1 | Using where; Using index |
| 1 | SIMPLE | wp_posts | eq_ref | PRIMARY,type_status_date | PRIMARY | 8 | db.wp_postmeta.post_id | 1 | Using where |
| 1 | SIMPLE | tt1 | ref | PRIMARY,term_taxonomy_id | PRIMARY | 8 | db.wp_posts.ID | 1 | Using where; Using index |
| 1 | SIMPLE | mt3 | ref | post_id,meta_key | post_id | 8 | db.tt2.object_id | 3 | Using where |
| 1 | SIMPLE | mt4 | ref | post_id,meta_key | post_id | 8 | db.tt3.object_id | 3 | Using where |
| 1 | SIMPLE | mt5 | ref | post_id,meta_key | post_id | 8 | db.wp_posts.ID | 3 | Using where |
| 1 | SIMPLE | mt6 | ref | post_id,meta_key | post_id | 8 | db.wp_posts.ID | 3 | Using where |
| 1 | SIMPLE | mt1 | ref | post_id,meta_key | post_id | 8 | db.mt5.post_id | 3 | Using where |
| 1 | SIMPLE | mt2 | ref | post_id,meta_key | post_id | 8 | db.mt3.post_id | 3 | Using where |
+-----+--------------+------------------------+---------+---------------------------+-----------+----------+-------------------------------------+-------+----------------------------------------------+
The "solution" I've stumbled on for now is pretty ugly, but for some inexplicable reason, it works. Adding the
STRAIGHT_JOIN
optimizer hint dropped the execution time from 18+ seconds to approximately 0.0022 seconds. Based on common sense and this question (When to use STRAIGHT_JOIN with MySQL), this solution seems like a bad idea, but it's the only thing I've tried that has worked. So, for now at least, I'm sticking to it. If anyone has any thoughts on why I shouldn't do this, or what I should try instead, I would love to hear them.If anyone is curious, I implemented it as a WordPress filter like so:
And for completeness, here is the
EXPLAIN
output for the query when usingSTRAIGHT_JOIN
. Again, I'm baffled. The old query used onlyref
andeq_ref
which I understand to be faster thanrange
, but this is orders of magnitude faster for some reason.Can you check if the "meta_value" field has an index? This could make a huge speed improvement.
I'm not sure about it, but maybe the CAST() functions cause some slowdown? Are they necessary?