MySQL slow query with join even though EXPLAIN sho

2019-06-23 15:08发布

问题:

I have the following scenario: In a MySQL database, I have 2 MyISAM tables, one with 4.2 million rows, and another with 320 million rows. The following is the schema for the tables:

Table1 (4.2M rows)

F1 INTEGER UNSIGNED NOT NULL PRIMARY KEY
f2 varchar(40)
f3 varchar(40)
f4 varchar(40)
f5 varchar(40)
f6 smallint(6)
f7 smallint(6)
f8 varchar(40)
f9 varchar(40)
f10 smallint(6)
f11 varchar(10)
f12 tinyint(4)
f13 smallint(6)
f14 text

Table2 (320M rows)

F1 INTEGER UNSIGNED NOT NULL PRIMARY KEY
f2 INTEGER UNSIGNED NOT NULL

Table2 is in a different database but I am using a stored procedure which queries the two tables. The relation between the two tables is that for Table1.F1 there may be up to approx. 100 rows in Table2.F1 (foreign key) which match, and the value for Table2.f2 will be returned for these matched keys. I have an index IX1(f2(15),f3(10)) on Table1 and an index IX2(F1,f2) and IX3(f2) in Table 2

The queries I am running are the following:

SELECT g.F1
FROM DB1.Table1 g 
INNER JOIN DB2.Table2 gp ON g.F1 = gp.F1 
WHERE (gp.f2 = 452677825) AND
(g.f2 = 'A string value') LIMIT 0,56

This query is sometimes very fast (<1s) but changing the string value that g.F2 is compared to leads to queries which take even over 11 and sometimes even 30 seconds. I cannot understand why this is so. The following is the output of the EXPLAIN on the SELECT that is executed.

1, 'SIMPLE', 'g', 'ref', 'PRIMARY,IX1', 'IX1', '17', 'const', 901, 'Using where'
1, 'SIMPLE', 'gp', 'ref', 'IX3,IX2', 'IX2', '8', 'DB1.g.F1,const', 1, 'Using index'

which seems to be quite a good execution plan. The number of rows in the top row of the explain goes to 2000 at most, but I do not see why this should take any longer than a fraction of a second to return results. I also ran profiler on the query and noticed that the queries are spending 99.9% of the time on the "Sending data" stage. Can anyone please explain why this is so, and what can be done to optimise the query?

Thanks in advance, Tim

回答1:

I'm not an expert in this area, but here are a few thoughts:

Query speed taking longer when g.F2 changes is because of caching. MySQL will save the results for each query (until the cache is full), but new queries are run on an empty cache, so they take longer. You shouldn't optimize based on this. (See How to measure accurately)

I can't tell from your information whether the g or gp table has greater specificity (seems like gp?) in the where clause, but you may want to try a subquery instead. (See How to force the inner query to execute first)

Regarding profiling, it's possible you're hitting a physical threshold like exceeding ram allocation (using swap is disastrous for performance) that would not be obvious from explain, or whether explain is just wrong in this case.



回答2:

If you are able to you may want to try tweaking your my.cnf, the property you want to play with is key_buffer_size. MyISAM indexes are stored in .MYI files if you locate these and total up the file sizes (e.g. ls -lh /var/lib/mysql/dbname/*.MYI) you can roughly estimate how big the key buffer needs to be to fit all of your indexes in. The MySQL docs does recommend not to exceed 25% of system memory though.



回答3:

The relation between the two tables is that for Table1.F1 there may be up to approx. 100 rows in Table2.F1

To clarify, is the relationship between Table1.F1 and Table2.F1 one-to-one, or one to many? To me, this statement implies one-to-many, but from the schema, each of the fields are primary (i.e. unique) keys.

At any rate, I suspect that the uniform of g.f2(15) is not uniform, and that when the statistical outliers are hit, performance degrades accordingly.

Do the results of

SELECT f2(15) AS f2_15, COUNT(*) AS cnt
FROM Table1
GROUP BY f2(15) 
ORDER BY cnt DESC

show some significant outliers?