Why does MySQL not use optimal indexes

2020-03-30 07:30发布

问题:

I'm trying to optimize my query, however, MySQL seems to be utilizing non-optimal indexes on the query and I can't seem to figure out what is wrong. My query is as follows:

SELECT  SQL_CALC_FOUND_ROWS deal_ID AS ID,dealTitle AS dealSaving,
       storeName AS title,deal_URL AS dealURL,dealDisclaimer,
       dealType, providerName,providerLogo AS providerIMG,createDate,
       latitude AS lat,longitude AS lng,'local' AS type,businessType,
       address1,city,dealOriginalPrice,NULL AS dealDiscountPercent,
       dealPrice,scoringBase, smallImage AS smallimage,largeImage AS image,
       storeURL AS storeAlias,
       exp(-power(greatest(0, 
             abs(69.0*DEGREES(ACOS(0.82835377099147 *
               COS(RADIANS(latitude)) * COS(RADIANS(-118.4-longitude)) +
               0.56020534635454*SIN(RADIANS(latitude)))))-2),
                       2)/(5.7707801635559)) *
            scoringBase * IF(submit_ID IN (18381),
               IF(businessType = 1,1.3,1.2),IF(submit_ID IN (54727),1.19, 1)
                          ) AS distance
    FROM  local_deals
    WHERE  latitude BETWEEN 33.345362318841 AND 34.794637681159
      AND  longitude BETWEEN -119.61862872928 AND -117.18137127072
      AND  state = 'CA'
      AND  country = 'US'
    ORDER BY  distance DESC
    LIMIT  48 OFFSET 0; 

Listing the indexes on the table reveals:

+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table       | Non_unique | Key_name        | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| local_deals |          0 | PRIMARY         |            1 | id              | A         |      193893 |     NULL | NULL   |      | BTREE      |         |               |
| local_deals |          0 | unique_deal_ID  |            1 | deal_ID         | A         |      193893 |     NULL | NULL   |      | BTREE      |         |               |
| local_deals |          1 | deal_ID         |            1 | deal_ID         | A         |      193893 |     NULL | NULL   |      | BTREE      |         |               |
| local_deals |          1 | store_ID        |            1 | store_ID        | A         |      193893 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | storeOnline_ID  |            1 | storeOnline_ID  | A         |           3 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | storeChain_ID   |            1 | storeChain_ID   | A         |         117 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | userProvider_ID |            1 | userProvider_ID | A         |           5 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | expirationDate  |            1 | expirationDate  | A         |        3127 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | createDate      |            1 | createDate      | A         |       96946 |     NULL | NULL   | YES  | BTREE      |         |               | 
| local_deals |          1 | city            |            1 | city            | A         |       17626 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | state           |            1 | state           | A         |         138 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | zip             |            1 | zip             | A         |       38778 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | country         |            1 | country         | A         |          39 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | latitude        |            1 | latitude        | A         |      193893 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | longitude       |            1 | longitude       | A         |      193893 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | eventDate       |            1 | eventDate       | A         |        4215 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | isNowDeal       |            1 | isNowDeal       | A         |           3 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | businessType    |            1 | businessType    | A         |           5 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | dealType        |            1 | dealType        | A         |           5 |     NULL | NULL   | YES  | BTREE      |         |               |
| local_deals |          1 | submit_ID       |            1 | submit_ID       | A         |           5 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------------+------------+-----------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Running explain extended reveals:

+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
| id   | select_type | table       | type | possible_keys                    | key   | key_len | ref   | rows  | filtered | Extra                                              |
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+
|    1 | SIMPLE      | local_deals | ref  | state,country,latitude,longitude | state | 35      | const | 52472 |   100.00 | Using index condition; Using where; Using filesort |
+------+-------------+-------------+------+----------------------------------+-------+---------+-------+-------+----------+----------------------------------------------------+

There are around 200k rows in the table. What is strange is that it is ignoring the latitude and longitude indexes as those should filter the table more. Running a query where I remove the "state" and "country" where commands reveals the following explain:

+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
| id   | select_type | table       | type  | possible_keys      | key       | key_len | ref  | rows  | filtered | Extra                                              |
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+
|    1 | SIMPLE      | local_deals | range | latitude,longitude | longitude | 5       | NULL | 30662 |   100.00 | Using index condition; Using where; Using filesort |
+------+-------------+-------------+-------+--------------------+-----------+---------+------+-------+----------+----------------------------------------------------+

This shows that the longitude index would better filter the table to 30,662 rows. Am I missing something here? How can I get MySQL to use all queries. Note that the table is InnoDB and I'm using MySQL 5.5.

回答1:

The best index for your query is a composite index on (country, state, latitude, longitude) (country and state could be swapped). MySQL has good documentation on multi-column indexes, which is here.

Basically, latitude and longitude are not particularly selective individually. Unfortunately, the standard B-tree index only supports one inequality, and your query has two.

Actually, if you want GIS processing, then you should use a spatial extension to MySQL.



回答2:

Depending on the size of your table, Gordon's suggested index may be "good enough". If you need to get even more performance, you need to go to a 2D partitioning technique, wherein you partition on latitude and arrange for the InnoDB PRIMARY KEY to begin with longitude. More details, and sample code, are available in my article.



回答3:

A generic technique for problems like this is to build a subquery with these properties:

  • It returns no more than LIMIT rows; and those are all that you need.
  • There is a "covering index" for the columns involved, plus the PRIMARY KEY.
  • You are using InnoDB.

Something like

SELECT b. ..., a.distance
    FROM  local_deals b
    JOIN  (
        SELECT id,
               (...) AS distance,
            FROM local_deals
            WHERE  latitude  BETWEEN   33.34536 AND   34.79464
              AND  longitude BETWEEN -119.61863 AND -117.18137
              AND  state = 'CA'
              AND  country = 'US'
            ORDER BY  distance ASC
            LIMIT  48 OFFSET 0
          ) AS a  ON b.id = a.id
    ORDER BY a.distance;

INDEX(country, state, latitude, longitude, id)  -- `id` is the PK
-- country and state first (because of '='); id last.

Why this helps...

  • The index is "covering", so the lengthy scan (of a lot more than 48 rows) is done entirely in the index's BTree. This cuts down on I/O for huge tables.
  • All the other fields (b.*) are not hauled around through tmp tables, etc. Only 48 are sets of those fields are dealt with.
  • The 48 lookups by id are especially efficient in InnoDB due to the "clustered PK".

When working with "huge" tables, where I/O dominates, this technique can be counted thus:

  • 1, or a small number of, blocks in the index are needed for the subquery. Note that the desired records are consecutive, or nearly so. (OK, if there are 30K to look through, it could be more than 100 blocks; hence my comment about shrinking the bounding box to start with.)
  • Then 48 (LIMIT) random fetches via id get the 48 rows.

Without the subquery, the bulky rows need to be fetched. And, depending on the index used, that could be up to 30K blocks fetched. That's orders of magnitude slower.

Also, 48 rows versus 30K rows will be written to a tmp table for sorting (ORDER BY).