MySQL查询优化 - 加入？(MySQL Query Optimisation - JOIN?)

一为你的MySQL专家:-)

我有以下查询：

SELECT o.*, p.name, p.amount, p.quantity 
FROM orders o, products p 
WHERE o.id = p.order_id AND o.total != '0.00' AND DATE(o.timestamp) BETWEEN '2012-01-01' AND '2012-01-31' 
ORDER BY o.timestamp ASC

订单表= 80900行
产品表= 125389行
o.id和p.order_id被索引

查询需要大约6秒即可完成 - 这是太长了。我正在寻找一种方式来优化它，可能使用临时表或不同类型的联接。我怕我的这两个概念的理解是相当有限。

任何人都可以提出一个办法，我优化这个查询？

Answer 1:

First, I would use a different style of syntax. ANSI-92 has had 20 years to bed in, and many RDBMS actually recommend not using the notation you have used. It's not going to make a difference in this case, but it really is very good practice for a host of reasons (that I'll let you investigate and make a decision on yourself).

Final answer, and example syntax:

SELECT
  o.*, p.name, p.amount, p.quantity  
FROM
  orders
INNER JOIN
  products
    ON orders.id = products.order_id 
WHERE
      orders.timestamp >= '2012-01-01'
  AND orders.timestamp <  '2012-02-01'
  AND orders.total     != '0.00' 
ORDER BY
  orders.timestamp ASC

As the orders table is the one you are making the initial filtering on, that's a very good place to start looking at optimisation.

With DATE(o.timestamp) BETWEEN x AND y you succeed in getting all dates and time in January. But that requires calling the DATE() function on every single row in the orders table (similar to what RBAR means). The RDBMS can't see through the function to just know how to avoid wasting time. Instead we need to do that optimisation, by re-arranging the maths to not need the function on the field we are filtering.

    orders.timestamp >= '2012-01-01'
AND orders.timestamp <  '2012-02-01'

This version allows the optimiser to know that you want a block of dates that are all sequential with each other. It's called a range-seek. It can use an index to very quickly find the first record and last record that fit that range, then pick out every record in between. That avoids checking all the records that don't fit, and even avoids checking all the records in the middle of the range; only the boundaries need to be sought out.

That assumes all the records are ordered by date, and that the optimiser can see that. To do so you need an index. With that in mind there seem to be two basic covering indexes that you could use:
- (id, timestamp)
- (timestamp, id)

The first is what I see people use the most. But that forces the optimiser to do the timestamp range-seek for each id separately. And since every id likely has a different timestamp value, you've gained nothing.

The second index is what I recommend.

Now, the optimiser can fullfill this part of your query, exceptionally quickly...

SELECT
  o.*
FROM
  orders
WHERE
      orders.timestamp >= '2012-01-01'
  AND orders.timestamp <  '2012-02-01'
ORDER BY
  orders.timestamp ASC

As it happens, even the ORDER BY has been optimised with the suggested index. It's already in the order that you want the data to be output. There is no need to re-sort everything after the join.

Then, to fullfill the total != '0.00' requirement, every row in your range is still checked. But you've already narrowed the range down so much that this will probably be fine. (I wont go in to it, but you will likely find it impossible to use indexes in MySQL to optimise this and the timestamp range-seek.)

Then, you have your join. That's optimised by an index you already have (products.order_id). For every record picked out by the snippet above, the optimiser can do an index seek and very quickly identify the matching record(s).

This all assumes that, in the vast majority of cases, every order row has one or more product rows. If, for example, only a very select few orders had any product rows, it may be faster to pick out the product rows of interest first; essentially looking at the joins happening in reverse order.

The optimiser actually makes that decision for you, but it's handy to know that it's doing that, then provide the indexes you estimate will be most useful to it.

You can check the explain plan to see if the indexes are being used. If not, your attempt to help was ignored. Probably because of the statistics of the data implying a different order of joining was better. If so you can then provide indexes to help that order of joins instead.

Answer 2:

使用说明，指示如何优化查询。我建议开始与道达尔和时间戳指数
您可能会发现取出date功能可以提高性能。
您应该使用现代的语法。

例如。

SELECT o.*, p.name, p.amount, p.quantity  
FROM orders o
     inner join products p  
     on o.id = p.order_id 
WHERE o.total != '0.00' 
AND o.timestamp BETWEEN '2012-01-01' AND '2012-01-31 23:59'  
ORDER BY o.timestamp ASC

Answer 3:

我不是专家的MySQL（更多的SQL Server）的，我认为你最好有o.timestamp指数，你需要重写这样的查询

o.timestamp >= '2012-01-01' and o.timestamp <= '2012-01-31' + INTERVAL 1 DAY

其中的逻辑是 - 如果你比较对列和常量的表达指数将无法正常工作。您需要比较列和常量

Answer 4:

选择 *：

选择与通配符*所有列将导致查询的含义和行为，如果表的模式的变化而变化，并可能导致查询检索的数据太多。

！=运算符是非标准：

使用<>运算符来测试不等式来代替。

混叠而不AS关键字：显式使用在列或表别名AS关键字，例如“TBL AS别名”是比隐别名如“TBL别名”更具有可读性。