CREATE TABLE `files` (
`did` int(10) unsigned NOT NULL DEFAULT '0',
`filename` varbinary(200) NOT NULL,
`ext` varbinary(5) DEFAULT NULL,
`fsize` double DEFAULT NULL,
`filetime` datetime DEFAULT NULL,
PRIMARY KEY (`did`,`filename`),
KEY `fe` (`filetime`,`ext`), -- This?
KEY `ef` (`ext`,`filetime`) -- or This?
) ENGINE=InnoDB DEFAULT CHARSET=utf8 ;
There are a million rows in the table. The filetimes are mostly distinct. There are a finite number of ext
values. So, filetime
has a high cardinality and ext
has a much lower cardinality.
The query involves both ext
and filetime
:
WHERE ext = '...'
AND filetime BETWEEN ... AND ...
Which of those two indexes is better? And why?
First, let's try FORCE INDEX
to pick either ef
or fe
. The timings are too short to get a clear picture of which is faster, but `EXPLAIN shows a difference:
Forcing the range on filetime
first. (Note: The order in WHERE
has no impact.)
mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
FROM files FORCE INDEX(fe)
WHERE ext = 'gif' AND filetime >= '2015-01-01'
AND filetime < '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
| 1 | SIMPLE | files | range | fe | fe | 14 | NULL | 16684 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+-------+-----------------------+
Forcing the low-cardinality ext
first:
mysql> EXPLAIN SELECT COUNT(*), AVG(fsize)
FROM files FORCE INDEX(ef)
WHERE ext = 'gif' AND filetime >= '2015-01-01'
AND filetime < '2015-01-01' + INTERVAL 1 MONTH;
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
| 1 | SIMPLE | files | range | ef | ef | 14 | NULL | 538 | Using index condition |
+----+-------------+-------+-------+---------------+------+---------+------+------+-----------------------+
Clearly, the rows
says ef
is better. But let's check with the Optimizer trace. The output is rather bulky; I'll show only the interesting parts. No FORCE
is needed; the trace will show both options then pick the better.
...
"potential_range_indices": [
...
{
"index": "fe",
"usable": true,
"key_parts": [
"filetime",
"ext",
"did",
"filename"
]
},
{
"index": "ef",
"usable": true,
"key_parts": [
"ext",
"filetime",
"did",
"filename"
]
}
],
...
"analyzing_range_alternatives": {
"range_scan_alternatives": [
{
"index": "fe",
"ranges": [
"2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 16684,
"cost": 20022, <-- Here's the critical number
"chosen": true
},
{
"index": "ef",
"ranges": [
"gif <= ext <= gif AND 2015-01-01 00:00:00 <= filetime < 2015-02-01 00:00:00"
],
"index_dives_for_eq_ranges": true,
"rowid_ordered": false,
"using_mrr": false,
"index_only": false,
"rows": 538,
"cost": 646.61, <-- Here's the critical number
"chosen": true
}
],
...
"attached_conditions_computation": [
{
"access_type_changed": {
"table": "`files`",
"index": "ef",
"old_type": "ref",
"new_type": "range",
"cause": "uses_more_keyparts" <-- Also interesting
}
}
With fe
(range column first), the range could be used, but it estimated scanning through 16684 rows fishing for ext='gif'
.
With ef
(low cardinality ext
first), it could use both columns of the index and drill down more efficiently in the BTree. Then it found an estimated 538 rows, all of which are useful for the query -- no further filtering needed.
Conclusions:
INDEX(filetime, ext)
used only the first column.
INDEX(ext, filetime)
used both columns.
- Put columns involved in
=
tests first in the index regardless of cardinality.
- The query plan won't go beyond the first 'range' column.
- "Cardinality" is irrelevant for composite indexes and this type of query.
("Using index condition" means that the Storage Engine (InnoDB) will use columns of the index beyond the one used for filtering.")