I have the following query:
SELECT table_1.id
FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)
WHERE
table_1.col_condition_1 = 0
AND table_1.col_condition_2 NOT IN (3, 4)
AND (table_2.id is NULL OR table_1.date_col > table_2.date_col)
LIMIT 5000;
And I have the following keys and indexes:
- table_1.id primary key.
- index on table_1.col_condition_1
- index on table_1.col_condition_2
- composite index on table_1.col_condition_1 and table_1.col_condition_2
The correct indexes are getting picked up. Query explain:
+--+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------------+---------+------------+----------+-----------------------+--+
| | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | |
+--+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------------+---------+------------+----------+-----------------------+--+
| | 1 | SIMPLE | table_1 | range | "the composite index", col_condition_1 index ,col_condition_2 index | "the composite index" | 7 | | 11819433 | Using index condition | |
| | 1 | SIMPLE | table_2 | eq_ref | PRIMARY,id_UNIQUE | PRIMARY | 8 | table_1.id | 1 | Using where | |
+--+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------------+---------+------------+----------+-----------------------+--+
table_1 has ~60 MM records, and table_2 has ~4 MM records.
The query takes 60 seconds to return a result.
What's interesting is that:
SELECT table_1.id
FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)
WHERE
table_1.col_condition_1 = 0
AND table_1.col_condition_2 NOT IN (3, 4)
LIMIT 5000;
takes 145 ms to return a result and has the same indexes picked as the first query.
SELECT table_1.id
FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)
WHERE
table_1.col_condition_1 = 0
AND (table_2.id is NULL OR table_1.date_col > table_2.date_col)
LIMIT 5000;
takes 174 ms to return a result.
Query explain:
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-------------+
| 1 | SIMPLE | table_1 | ref | "the composite index", col_condition_1 index ,col_condition_2 index | col_condition_1 | 2 | const | 30381842 | NULL |
| 1 | SIMPLE | table_2 | eq_ref | PRIMARY,id_UNIQUE | PRIMARY | 8 | table_1.id | 1 | Using where |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-------------+
And
SELECT table_1.id
FROM
table_1
LEFT JOIN table_2 ON (table_1.id = table_2.id)
WHERE
table_1.col_condition_2 NOT IN (3, 4)
AND (table_2.id is NULL OR table_1.date_col > table_2.date_col)
LIMIT 5000;
takes about 1 second to return a result.
Query explain:
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-----------------------+
| 1 | SIMPLE | table_1 | range | "the composite index", col_condition_1 index ,col_condition_2 index | col_condition_2 | 5 | | 36254294 | Using index condition |
| 1 | SIMPLE | table_2 | eq_ref | PRIMARY,id_UNIQUE | PRIMARY | 8 | table_1.id | 1 | Using where |
+----+-------------+---------+--------+---------------------------------------------------------------------+-----------------+---------+------------+----------+-----------------------+
Also when I use every where condition separately, the query returns a result in ~100 ms.
My question is why the query takes a significant amount of time (60 seconds) to return a result when using the three where conditions together even though it looks like the correct indexes are getting used and executing the query with any two of the three where conditions also returns a result in much less time.
Also, is there a way to optimize this query?
Thank you.
EDIT:
create tables:
table_1:
CREATE TABLE `table_1` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`col_condition_1` tinyint(1) DEFAULT '0',
`col_condition_2` int(11) DEFAULT NULL,
`date_col` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `compositeidx` (`col_condition_1`,`col_condition_2`),
KEY `col_condition_1_idx` (`col_condition_1`),
KEY `col_condition_2_idx` (`col_condition_2`)
) ENGINE=InnoDB AUTO_INCREMENT=68272192 DEFAULT CHARSET=utf8
table_2:
CREATE TABLE `table_2` (
`id` bigint(20) NOT NULL,
`date_col` timestamp NULL DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1