Why are UNION queries so slow in MySQL?

2019-01-26 07:20发布

问题:

When I optimize my 2 single queries to run in less than 0.02 seconds and then UNION them the resulting query takes over 1 second to run. Also, a UNION ALL takes longer than a UNION DISTINCT. I would assume allowing duplicates would make the query run faster and not slower. Am I really just better off running the 2 queries separately? I would prefer to use the UNION.

回答1:

When I optimize my 2 single queries to run in less than 0.02 seconds and then UNION them the resulting query takes over 1 second to run.

Do your queries include ORDER BY … LIMIT clauses?

If you put an ORDER BY … LIMIT after a UNION, it gets applied to the whole UNION, and indexes cannot be used in this case.

If id is a primary key, this query will be instant:

SELECT  *
FROM    table
ORDER BY id
LIMIT 1

, but this one will not:

SELECT  *
FROM    table
UNION ALL
SELECT  *
FROM    table
ORDER BY id
LIMIT 1

Also, a UNION ALL takes longer than a UNION DISTINCT. I would assume allowing duplicates would make the query run faster and not slower.

This also seems to be due to ORDER BY. Sorting a smaller set is faster than a larger one.

Am I really just better off running the 2 queries separately? I would prefer to use the UNION

Do you need the resulting set to be sorted?

If not, just get rid of the final ORDER BY.



回答2:

A guess: Since you query one table with 2 unions, it might be,that mysql has difficulties to decide on a locking strategy for the table, or it tries some caching, that doesn't work here since you query for disjoint sets, tries to multithread the access (very reasonable) but runs into some locking/concurrency/file-seeking issues..

unions might also generally employ a higher safety setting, since these two selects have to be consistent. If you put them into separate transactions, they do not.

Experiment: Make a duplicate of the table and union those. If I'm right, it should be faster.

Possible solution: Split the single file into multiple files, to allow for better concurrency strategies. This wouldn't/shouldn't help with locking issues, but rules out the multithreading/seeking problems in the database.

It would be useful to know, which storage engine you use.

Well just my 2 cents. Can't test this here right now.



回答3:

Could it be that you measure response time and not time to retrieve all data?