How would using an ORDER BY clause both increase a

2019-04-12 17:34发布

I have a MySQL table called devicelog with it's PK on id, but multiple indices on device_id (INT), field_id (INT), and unixtime (BIGINT). They are just the default InnoDB indices.

I'm trying to get the ID next to a certain time, I get WAY different performance with different values and different ORDER BYs. IDs and unixtimes both have a positive association, since they both are increasing in order as more data gets inserted, so it seems like it would be okay to safely omit ordering on unixtime. My table has around 25 million records and performance is extremely vital.

This query is fairly slow (~0.5 seconds): Edit: after using USE INDEX(unixtime), I was able to increase performance quite a bit (<0.01 seconds!).

SELECT 
    id
FROM
    devicelog
USE INDEX(unixtime) /* edit: looking at the EXPLAIN, I can use this index and it sped things up a bit */
WHERE
    device_id = 26
        AND field_id = 64
        AND unixtime >= 1397166634707 /* a fairly recent time */
/* with no ORDER BY clause, this query is surprisingly slow */
LIMIT 1

EXPLAIN:

1, SIMPLE, devicelog, index_merge, device_id,field_id,field_id_2,unixtime, field_id,device_id, 8,8, , 6667, Using intersect(field_id,device_id); Using where

This query is extremely fast (<0.01 seconds):

SELECT 
    id
FROM
    devicelog
WHERE
    device_id = 26
        AND field_id = 64
        AND unixtime >= 1397166634707 /* a fairly recent time */
ORDER BY unixtime ASC                 /* <- using unixtime to order */
LIMIT 1

EXPLAIN:

1, SIMPLE, devicelog, range, device_id,field_id,field_id_2,unixtime, unixtime, 9, , 897776, Using index condition; Using where

How would omitting an ORDER BY decrease performance? It seems logical to think that it would increase speed.

Yet, if I change the unixtime to something far back, to "1", it will completely slow down when I use the ORDER BY unixtime. I believe the unixtime index is ordered ascendingly, so this doesn't make much sense either.

This query performs in an opposite manner as the queries above.

Extremely fast (<0.01 seconds):

SELECT 
    id
FROM
    devicelog
WHERE
    device_id = 26
        AND field_id = 64
        AND unixtime >= 1 /* a long time ago */
LIMIT 1

EXPLAIN:

1, SIMPLE, devicelog, index_merge, device_id,field_id,field_id_2,unixtime, field_id,device_id, 8,8, , 6742, Using intersect(field_id,device_id); Using where

This query is the exact same as the fast one, except it's using an older time:

EXTREMELY slow (~7 seconds):

SELECT 
    id
FROM
    devicelog
WHERE
    device_id = 26
        AND field_id = 64
        AND unixtime >= 1         /* a long time ago */
ORDER BY unixtime ASC             /* <- using unixtime to order */
LIMIT 1

EXPLAIN:

1, SIMPLE, devicelog, index, device_id,field_id,field_id_2,unixtime, unixtime, 9, , 3504, Using where

Does anyone have any insight on the vast performance differences?

2条回答
祖国的老花朵
2楼-- · 2019-04-12 18:08

It's hard to make clear suggestions about performance without knowing stuff like the number of rows in your table, and the exact structure of the table.

You might try a compound covering index on (unixtime, device_id, file_id, id). (Look up covering index if you don't know that term).

This will allow the unixtime part of your query to be satisfied with BTREE lookup, then the rest of your query can be satisfied with an index scan.

If you specify ORDER BY unixtime ASC LIMIT 1 you're telling the query engine to stop scanning that index (which is ordered by unixtime as soon as it gets a single hit.

I don't know why it sometimes keeps going on the scan for seven seconds when you omit the ORDER BY. It's possible it has to hunt for the matching device_id and file_id values.

查看更多
叛逆
3楼-- · 2019-04-12 18:15

I think it's documented behavior of LIMIT optimization, see http://dev.mysql.com/doc/refman/5.5/en/limit-optimization.html

Optimizing LIMIT Queries

MySQL sometimes optimizes a query that has a LIMIT row_count clause and no HAVING clause:

[...] If you use LIMIT row_count with ORDER BY, MySQL ends the sorting as soon as it has found the first row_count rows of the sorted result, rather than sorting the entire result. If ordering is done by using an index, this is very fast. If a filesort must be done, all rows that match the query without the LIMIT clause are selected, and most or all of them are sorted, before the first row_count are found. After the initial rows have been found, MySQL does not sort any remainder of the result set.

[...]

As soon as MySQL has sent the required number of rows to the client, it aborts the query unless you are using SQL_CALC_FOUND_ROWS.

Because you're trying to get the ID next to a certain date, I would think ordering the result very vital, because else you can get an arbitrary value. Else you've got to use MIN(id) with your conditions to get the desired id value.

查看更多
登录 后发表回答