MySQL optimization on join tables with range crite

2019-02-20 03:07发布

问题:

I am going to join two tables by using a single position in one table to the range (represented by two columns) in another table.

However, the performance is too slow, which is about 20 mins. I have tried adding the index on the table or changing the query. But the performance is still poor.

So, I am asking for optimization of the joining speed.


The following is the query to MySQL.

mysql> SELECT `inVar`.chrom, `inVar`.pos, `openChrom_K562`.score
    -> FROM `inVar`
    -> LEFT JOIN `openChrom_K562`
    -> ON (
    -> `inVar`.chrom=`openChrom_K562`.chrom AND
    -> `inVar`.pos BETWEEN `openChrom_K562`.chromStart AND `openChrom_K562`.chromEnd
    -> );

inVar and openChrom_K562 are the tables I used.

inVar stores the single position in each row.

openChrom_K562 stores the range information indicated by chromStart and chromEnd.

inVar contains 57902 rows and openChrom_K562 has 137373 rows respectively.


Fields on the tables.

mysql> DESCRIBE inVar;
+-------+-------------+------+-----+---------+-------+
| Field | Type        | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| chrom | varchar(31) | NO   | PRI | NULL    |       |
| pos   | int(10)     | NO   | PRI | NULL    |       |
+-------+-------------+------+-----+---------+-------+

mysql> DESCRIBE openChrom_K562;
+------------+-------------+------+-----+---------+-------+
| Field      | Type        | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| chrom      | varchar(31) | NO   | MUL | NULL    |       |
| chromStart | int(10)     | NO   | MUL | NULL    |       |
| chromEnd   | int(10)     | NO   |     | NULL    |       |
| score      | int(10)     | NO   |     | NULL    |       |
+------------+-------------+------+-----+---------+-------+

Index built in the tables

mysql> SHOW INDEX FROM inVar;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| inVar |          0 | PRIMARY  |            1 | chrom       | A         |        NULL |     NULL | NULL   |      | BTREE      |         |
| inVar |          0 | PRIMARY  |            2 | pos         | A         |       57902 |     NULL | NULL   |      | BTREE      |         |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

mysql> SHOW INDEX FROM openChrom_K562;
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table          | Non_unique | Key_name    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| openChrom_K562 |          1 | start_end   |            1 | chromStart  | A         |      137373 |     NULL | NULL   |      | BTREE      |         |
| openChrom_K562 |          1 | start_end   |            2 | chromEnd    | A         |      137373 |     NULL | NULL   |      | BTREE      |         |
| openChrom_K562 |          1 | chrom_only  |            1 | chrom       | A         |          22 |     NULL | NULL   |      | BTREE      |         |
| openChrom_K562 |          1 | chrom_start |            1 | chrom       | A         |          22 |     NULL | NULL   |      | BTREE      |         |
| openChrom_K562 |          1 | chrom_start |            2 | chromStart  | A         |      137373 |     NULL | NULL   |      | BTREE      |         |
| openChrom_K562 |          1 | chrom_end   |            1 | chrom       | A         |          22 |     NULL | NULL   |      | BTREE      |         |
| openChrom_K562 |          1 | chrom_end   |            2 | chromEnd    | A         |      137373 |     NULL | NULL   |      | BTREE      |         |
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

Execution plan on MySQL

mysql> EXPLAIN SELECT `inVar`.chrom, `inVar`.pos, score  FROM `inVar`  LEFT JOIN `openChrom_K562`  ON ( inVar.chrom=openChrom_K562.chrom AND  `inVar`.pos BETWEEN chromStart AND chromEnd );
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+
| id | select_type | table          | type  | possible_keys                              | key        | key_len | ref             | rows  | Extra       |
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+
|  1 | SIMPLE      | inVar          | index | NULL                                       | PRIMARY    | 37      | NULL            | 57902 | Using index |
|  1 | SIMPLE      | openChrom_K562 | ref   | start_end,chrom_only,chrom_start,chrom_end | chrom_only | 33      | tmp.inVar.chrom |  5973 |             |
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+

It seems it only optimizes by looking chrom in two tables. Then do the brute-force comparing in the tables.

Is there any ways to do the further optimization like indexing on the position?

(It is my first time posting the question, sorry for the poor posting quality.)

回答1:

chrom_only is likely to be a bad index selection for your join as you only have chrom 22 values.

If I have interpreted this right the query should be faster if using start_end

SELECT `inVar`.chrom, `inVar`.pos, `openChrom_K562`.score
FROM `inVar`
LEFT JOIN `openChrom_K562`
USE INDEX (`start_end`)
ON (
`inVar`.chrom=`openChrom_K562`.chrom AND
`inVar`.pos BETWEEN `openChrom_K562`.chromStart AND `openChrom_K562`.chromEnd
)