I have a following table
CREATE TABLE `test_series_analysis_data` (
`email` varchar(255) NOT NULL,
`mappingId` int(11) NOT NULL,
`packageId` varchar(255) NOT NULL,
`sectionName` varchar(255) NOT NULL,
`createdAt` datetime(3) DEFAULT NULL,
`marksObtained` float NOT NULL,
`updatedAt` datetime DEFAULT NULL,
`testMetaData` longtext,
PRIMARY KEY (`email`,`mappingId`,`packageId`,`sectionName`),
KEY `rank_index` (`mappingId`,`packageId`,`sectionName`,`marksObtained`),
KEY `mapping_package` (`mappingId`,`packageId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
Following is the output of the explain for the queries:
explain select rank
from (
select email, @i:=@i+1 as rank
from test_series_analysis_data ta
join (select @i:=0) va
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
) as inter
where inter.email = ?4;
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 767 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | <derived3> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 2 | DERIVED | ta | NULL | ref | rank_index,mapping_package | rank_index | 4 | const | 20160 | 1.00 | Using where; Using index |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+----------------------------+-------------+---------+-------+-------+----------+--------------------------+
Query optimizer could have used both indexes but rank_index is a covering index so it got picked. What surprises me is the output of the following query:
explain select rank
from (
select email, @i:=@i+1 as rank
from test_series_analysis_data ta use index (mapping_package)
join (select @i:=0) va
where mappingId = ?1
and packageId = ?2
and sectionName = ?3
order by marksObtained desc
) as inter
where inter.email = ?4;
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 767 | const | 10 | 100.00 | NULL |
| 2 | DERIVED | <derived3> | NULL | system | NULL | NULL | NULL | NULL | 1 | 100.00 | Using filesort |
| 2 | DERIVED | ta | NULL | ref | mapping_package | mapping_package | 4 | const | 19434 | 1.00 | Using index condition |
| 3 | DERIVED | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | No tables used |
+----+-------------+------------+------------+--------+-----------------+-----------------+---------+-------+-------+----------+-----------------------+
Why are there rows
lesser (19434<20160) when the index being used is mapping_package. rank_index can better select what is required so the row count should be lesser in rank_index.
So does this mean mapping_package index is better than rank_index for the given query?
Does it have any effect that sectionName is a varchar so both indexes should give similar performance?
Also I am assuming Using index condition
is selecting only few rows from index and scanning some more. While in case Using where; Using index
, optimizer has to only read the index and not table to get rows and then it is selecting some data. Then why Using where
is missing while using rank_index?
Moreover why is the key_len for mapping_package is 4 when there are only two columns in the index?
Help appreciated.
(19434<20160)
-- Both of those numbers are estimates. It is unusual for them to be that close. I'll bet if you didANALYZE TABLE
, both would change, possibly changing the inequality.Notice something else:
Using where; Using index
versusUsing index condition
.But first, let me remind you that, in InnoDB, the
PRIMARY KEY
columns are tacked onto the secondary key. So, effectively you haveNow let's decide what the optimal index should be: where mappingId = ?1 and packageId = ?2 and sectionName = ?3 order by marksObtained desc
=
parts ofWHERE
:mappingId
,packageId
,sectionName
, in any order;ORDER BY
column(s):marksObtained
email
(the only other column mentioned anywhere in theSELECT
) is in the key, it will be "Covering".This says that
rank_index
is "perfect", and the other index is not so good. Alas,EXPLAIN
does not clearly say that.You, too, could have figured this out -- all you needed is to study my blog: http://mysql.rjweb.org/doc.php/index_cookbook_mysql (Sorry; it's getting late, and I am getting cheeky.)
Other tips:
(255)
. When a tmp table is needed, this can make the the tmp table bigger, hence less efficient. Lower the limit to something reasonable. Or...SMALLINT UNSIGNED
. This will improve performance in other ways, such as decreasing costly I/O. (OK, 20 rows is pretty small, so this may not apply.)Why is
key_len
4? That implies that one column was used, namely the 4-byteINT
mappingId
. I would have expected it to use the second column, too. So, I am stumped.EXPLAIN FORMAT=JSON SELECT ...
may provide more clues.