How to avoid “Using temporary” in many-to-many que

2019-01-09 17:19发布

问题:

This query is very simple, all I want to do, is get all the articles in given category ordered by last_updated field:

SELECT
    `articles`.*
FROM
    `articles`,
    `articles_to_categories`
WHERE
        `articles`.`id` = `articles_to_categories`.`article_id`
        AND `articles_to_categories`.`category_id` = 1
ORDER BY `articles`.`last_updated` DESC
LIMIT 0, 20;

But it runs very slow. Here is what EXPLAIN said:

select_type  table                   type     possible_keys           key         key_len  ref                                rows  Extra
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SIMPLE       articles_to_categories  ref      article_id,category_id  article_id  5        const                              5016  Using where; Using temporary; Using filesort
SIMPLE       articles                eq_ref   PRIMARY                 PRIMARY     4        articles_to_categories.article_id  1

Is there a way to rewrite this query or add additional logic to my PHP scripts to avoid Using temporary; Using filesort and speed thing up?

The table structure:

*articles*
id | title | content | last_updated

*articles_to_categories*
article_id | category_id

UPDATE

I have last_updated indexed. I guess my situation is explained in documentation:

In some cases, MySQL cannot use indexes to resolve the ORDER BY, although it still uses indexes to find the rows that match the WHERE clause. These cases include the following:

The key used to fetch the rows is not the same as the one used in the ORDER BY: SELECT * FROM t1 WHERE key2=constant ORDER BY key1;

You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)

but I still have no idea how to fix this.

回答1:

Here's a simplified example I did for a similar performance related question sometime ago that takes advantage of innodb clustered primary key indexes (obviously only available with innodb !!)

  • http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
  • http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/

You have 3 tables: category, product and product_category as follows:

drop table if exists product;
create table product
(
prod_id int unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 

drop table if exists category;
create table category
(
cat_id mediumint unsigned not null auto_increment primary key,
name varchar(255) not null unique
)
engine = innodb; 

drop table if exists product_category;
create table product_category
(
cat_id mediumint unsigned not null,
prod_id int unsigned not null,
primary key (cat_id, prod_id) -- **note the clustered composite index** !!
)
engine = innodb;

The most import thing is the order of the product_catgeory clustered composite primary key as typical queries for this scenario always lead by cat_id = x or cat_id in (x,y,z...).

We have 500K categories, 1 million products and 125 million product categories.

select count(*) from category;
+----------+
| count(*) |
+----------+
|   500000 |
+----------+

select count(*) from product;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+

select count(*) from product_category;
+-----------+
| count(*)  |
+-----------+
| 125611877 |
+-----------+

So let's see how this schema performs for a query similar to yours. All queries are run cold (after mysql restart) with empty buffers and no query caching.

select
 p.*
from
 product p
inner join product_category pc on 
    pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
 p.prod_id desc -- sry dont a date field in this sample table - wont make any difference though
limit 20;

+---------+----------------+
| prod_id | name           |
+---------+----------------+
|  993561 | Product 993561 |
|  991215 | Product 991215 |
|  989222 | Product 989222 |
|  986589 | Product 986589 |
|  983593 | Product 983593 |
|  982507 | Product 982507 |
|  981505 | Product 981505 |
|  981320 | Product 981320 |
|  978576 | Product 978576 |
|  973428 | Product 973428 |
|  959384 | Product 959384 |
|  954829 | Product 954829 |
|  953369 | Product 953369 |
|  951891 | Product 951891 |
|  949413 | Product 949413 |
|  947855 | Product 947855 |
|  947080 | Product 947080 |
|  945115 | Product 945115 |
|  943833 | Product 943833 |
|  942309 | Product 942309 |
+---------+----------------+
20 rows in set (0.70 sec) 

explain
select
 p.*
from
 product p
inner join product_category pc on 
    pc.cat_id = 4104 and pc.prod_id = p.prod_id
order by
 p.prod_id desc -- sry dont a date field in this sample table - wont make any diference though
limit 20;

+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
| id | select_type | table | type   | possible_keys | key     | key_len | ref           | rows | Extra                                        |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
|  1 | SIMPLE      | pc    | ref    | PRIMARY       | PRIMARY | 3       | const           |  499 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | p     | eq_ref | PRIMARY       | PRIMARY | 4       | vl_db.pc.prod_id |    1 |                                              |
+----+-------------+-------+--------+---------------+---------+---------+------------------+------+----------------------------------------------+
2 rows in set (0.00 sec)

So that's 0.70 seconds cold - ouch.

Hope this helps :)

EDIT

Having just read your reply to my comment above it seems you have one of two choices to make:

create table articles_to_categories
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(article_id, category_id), -- good for queries that lead with article_id = x
key (category_id)
)
engine=innodb;

or.

create table categories_to_articles
(
article_id int unsigned not null,
category_id mediumint unsigned not null,
primary key(category_id, article_id), -- good for queries that lead with category_id = x
key (article_id)
)
engine=innodb;

depends on your typical queries as to how you define your clustered PK.



回答2:

You should be able to avoid filesort by adding a key on articles.last_updated. MySQL needs the filesort for the ORDER BY operation, but can do it without filesort as long as you order by an indexed column (with some limitations).

For much more info, see here: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html



回答3:

I assume you have made the following in your db:

1) articles -> id is a primary key

2) articles_to_categories -> article_id is a foreign key of articles -> id

3) you can create index on category_id



回答4:

ALTER TABLE articles ADD INDEX (last_updated);
ALTER TABLE articles_to_categories ADD INDEX (article_id);

should do it. The right plan is to find the first few records using the first index and do the JOIN using the second one. If it doesn't work, try STRAIGHT_JOIN or something to enforce proper index usage.