I am going to join two tables by using a single position in one table to the range (represented by two columns) in another table.
However, the performance is too slow, which is about 20 mins. I have tried adding the index on the table or changing the query. But the performance is still poor.
So, I am asking for optimization of the joining speed.
The following is the query to MySQL.
mysql> SELECT `inVar`.chrom, `inVar`.pos, `openChrom_K562`.score
-> FROM `inVar`
-> LEFT JOIN `openChrom_K562`
-> ON (
-> `inVar`.chrom=`openChrom_K562`.chrom AND
-> `inVar`.pos BETWEEN `openChrom_K562`.chromStart AND `openChrom_K562`.chromEnd
-> );
inVar
and openChrom_K562
are the tables I used.
inVar
stores the single position in each row.
openChrom_K562
stores the range information indicated by chromStart
and chromEnd
.
inVar
contains 57902 rows and openChrom_K562
has 137373 rows respectively.
Fields on the tables.
mysql> DESCRIBE inVar;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| chrom | varchar(31) | NO | PRI | NULL | |
| pos | int(10) | NO | PRI | NULL | |
+-------+-------------+------+-----+---------+-------+
mysql> DESCRIBE openChrom_K562;
+------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+-------+
| chrom | varchar(31) | NO | MUL | NULL | |
| chromStart | int(10) | NO | MUL | NULL | |
| chromEnd | int(10) | NO | | NULL | |
| score | int(10) | NO | | NULL | |
+------------+-------------+------+-----+---------+-------+
Index built in the tables
mysql> SHOW INDEX FROM inVar;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| inVar | 0 | PRIMARY | 1 | chrom | A | NULL | NULL | NULL | | BTREE | |
| inVar | 0 | PRIMARY | 2 | pos | A | 57902 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
mysql> SHOW INDEX FROM openChrom_K562;
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| openChrom_K562 | 1 | start_end | 1 | chromStart | A | 137373 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | start_end | 2 | chromEnd | A | 137373 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_only | 1 | chrom | A | 22 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_start | 1 | chrom | A | 22 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_start | 2 | chromStart | A | 137373 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_end | 1 | chrom | A | 22 | NULL | NULL | | BTREE | |
| openChrom_K562 | 1 | chrom_end | 2 | chromEnd | A | 137373 | NULL | NULL | | BTREE | |
+----------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Execution plan on MySQL
mysql> EXPLAIN SELECT `inVar`.chrom, `inVar`.pos, score FROM `inVar` LEFT JOIN `openChrom_K562` ON ( inVar.chrom=openChrom_K562.chrom AND `inVar`.pos BETWEEN chromStart AND chromEnd );
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+
| 1 | SIMPLE | inVar | index | NULL | PRIMARY | 37 | NULL | 57902 | Using index |
| 1 | SIMPLE | openChrom_K562 | ref | start_end,chrom_only,chrom_start,chrom_end | chrom_only | 33 | tmp.inVar.chrom | 5973 | |
+----+-------------+----------------+-------+--------------------------------------------+------------+---------+-----------------+-------+-------------+
It seems it only optimizes by looking chrom
in two tables. Then do the brute-force comparing in the tables.
Is there any ways to do the further optimization like indexing on the position?
(It is my first time posting the question, sorry for the poor posting quality.)