MySQL varchar index length

2019-01-08 15:08发布

问题:

I have a table like this:

CREATE TABLE `products` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(512) NOT NULL,
  `description` text,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=38 DEFAULT CHARSET=utf8;

and one like this:

CREATE TABLE `product_variants` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `product_id` int(11) unsigned NOT NULL,
  `product_code` varchar(255) NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `product_code` (`product_code`),
  KEY `product_variant_product_fk` (`product_id`),
  CONSTRAINT `product_variant_product_fk` FOREIGN KEY (`product_id`) REFERENCES `products` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1037 DEFAULT CHARSET=utf8;

and an SQL statement like this

SELECT p.id AS id, p.name AS name, p.description AS description, pv.id AS product_variant_id, pv.product_code AS product_code
FROM products p
INNER JOIN product_variants pv ON pv.product_id = p.id
ORDER BY p.name ASC
LIMIT 300 OFFSET 0;

which if I explain gives me this:

+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
| id | select_type | table | type | possible_keys              | key                        | key_len | ref     | rows   | Extra          |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
|  1 | SIMPLE      | p     | ALL  | PRIMARY                    | NULL                       | NULL    | NULL    | 993658 | Using filesort |
|  1 | SIMPLE      | pv    | ref  | product_variant_product_fk | product_variant_product_fk | 4       | db.p.id |      1 |                |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
2 rows in set (0.00 sec)

For a million rows, this is pretty slow. I've tried adding an index on products.name with:

ALTER TABLE products ADD INDEX `product_name_idx` (name(512));

which gives this:

mysql> show indexes from products;
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name         | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| products |          0 | PRIMARY          |            1 | id              | A         |      993658 |     NULL | NULL   |      | BTREE      |         |               |
| products |          1 | product_manf_fk  |            1 | manufacturer_id | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| products |          1 | product_name_idx |            1 | name            | A         |         201 |      255 | NULL   |      | BTREE      |         |               |
+----------+------------+------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

I think that the Sub_part column shows the prefix that has been in indexed (in bytes), as described on this page.

When I re-explain the query, I get:

+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
| id | select_type | table | type | possible_keys              | key                        | key_len | ref     | rows   | Extra          |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
|  1 | SIMPLE      | p     | ALL  | PRIMARY                    | NULL                       | NULL    | NULL    | 993658 | Using filesort |
|  1 | SIMPLE      | pv    | ref  | product_variant_product_fk | product_variant_product_fk | 4       | db.p.id |      1 |                |
+----+-------------+-------+------+----------------------------+----------------------------+---------+---------+--------+----------------+
2 rows in set (0.00 sec)

which looks like the new index is not being used. As described on this page, indexes will not be used for sorting if they are prefix indexes. In fact if I truncate the data with:

alter table products modify `name`  varchar(255) not null;

The explain gives:

+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+
| id | select_type | table | type  | possible_keys              | key                        | key_len | ref                                          | rows | Extra |
+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+
|  1 | SIMPLE      | p     | index | PRIMARY                    | product_name_idx           | 767     | NULL                                         |  300 |       |
|  1 | SIMPLE      | pv    | ref   | product_variant_product_fk | product_variant_product_fk | 4       | oh_2c98c233_69fe_4f06_ad0d_fe6f85a5beac.p.id |    1 |       |
+----+-------------+-------+-------+----------------------------+----------------------------+---------+----------------------------------------------+------+-------+

which I think backs that up. However, it says on this page that InnoDB tables can have up to 767 bytes of index. If the length is in bytes, why does it refuse to have more than 255? If it's in characters, how is it deciding the length of each UTF-8 character? Is it just assuming 3?

Also, am using this version of MySQL:

mysql> select version();
+------------+
| version()  |
+------------+
| 5.5.27-log |
+------------+
1 row in set (0.00 sec)

回答1:

I must revise my answer due to my research. I originally posted this (quoting myself):

I believe the answer is that you cannot know how many characters will be in the index because you cannot know how many bytes your characters will be (unless you do something to exclude multi-byte characters).

And I'm not sure, but it might still be correct, but not in quite the way I was thinking.

Here is the correct answer:

MySQL assumes 3 bytes per utf8 character. 255 characters is the maximum index size you can specify per column, because 256x3=768, which breaks the 767 byte limit.

If you don't specify index size, MySQL chooses the maximum size (i.e. 255 per column). A UNIQUE constraint cannot be put on a utf8 column whose length is greater than 255, because a unique index must contain the entire cell value. But a regular index can be used - it will just index the first 255 characters (or first 767 bytes?). And that is where there is still some mystery for me.

The MySTERY: I can see why MySQL assumes 3 bytes per character, for safety, because otherwise the UNIQUE constraint could be broken. But the docs seem to suggest that the index is actually sized in bytes, not characters. So, suppose you put a 255 char (765 byte) index on a varchar(256) column. If the characters you store are all ASCII, 1-byte characters, like A-Z, a-z, 0-9, then then you can fit the entire column into the 767 byte index. And it seems like that is what would actually happen.

Below is some more information from my original answer about characters, bytes, etc.


According to wikipedia, UTF-8 character can be 1,2, 3, or 4 bytes long. But, according to this mysql documentation, the maximium character size is 3 bytes, and so any column index index over 255 characters might hit that byte limit. But as I understand it, it might not. If most of your characters are in the ASCII range, then your average character size will be closer to 1 byte. If your average character size is, for example, 1.3 bytes (mostly 1 byte, but a significant number of 2-3 byte characters), then you could specify an index of 767/1.3

So, if you are storing mostly 1-byte characters, your actual character limit would be more like: 767 / 1.3 = 590. But it turns out that is not the way it works. 255 characters is the limit.

As mentioned in this MySQL documentation,

Prefix limits are measured in bytes, whereas the prefix length in CREATE INDEX statements is interpreted as number of characters for nonbinary data types (CHAR, VARCHAR, TEXT). Take this into account when specifying a prefix length for a column that uses a multi-byte character set.

It seems that MySQL is advising people to do a calculation/guestimation like I just did in order to determine your key size for a varchar column. But in fact you cannot specify an index larger than 255 for utf8 columns.

Finally, if you refer back to my second link again, there is also this:

When the innodb_large_prefix configuration option is enabled, this length limit is raised to 3072 bytes, for InnoDB tables that use the DYNAMIC and COMPRESSED row formats.

So it seems like you can get much larger indexes if you want to, with a bit of tweaking. Just make sure the row formats are DYNAMIC or COMPRESSED. You can probably specify an index of 1023 or 1024 characters in that case.


By the way, it turns out that you can store 4-byte characters using the utf8mb4 character set. The utf8 character set apparently stores only "plane 0" characters.

EDIT:

I just tried to create a composite index on a varchar(511) column with a tinyint(1) column and got the error message saying the max index size was 767 bytes. This makes me believe that MySQL assumes utf8 character set columns will contain 3 bytes per character (the maximum), and allows you to use 255 chars max. But perhaps that is only with composite indexes. I will update my answer as I find out more. But for now I'm leaving this as an edit.



回答2:

Limits on InnoDB Tables

Warning

Do not convert MySQL system tables in the mysql database from MyISAM to InnoDB tables. This is an unsupported operation. If you do this, MySQL does not restart until you restore the old system tables from a backup or re-generate them with the mysql_install_db program.

Warning

It is not a good idea to configure InnoDB to use data files or log files on NFS volumes. Otherwise, the files might be locked by other processes and become unavailable for use by MySQL.

Maximums and Minimums

  1. A table can contain a maximum of 1000 columns.
  2. A table can contain a maximum of 64 secondary indexes.
  3. By default, an index key for a single-column index can be up to 767 bytes. The same length limit applies to any index key prefix. For example, you might hit this limit with a column prefix index of more than 255 characters on a TEXT or VARCHAR column, assuming a UTF-8 character set and the maximum of 3 bytes for each character. When the innodb_large_prefix configuration option is enabled, this length limit is raised to 3072 bytes, for InnoDB tables that use the DYNAMIC and COMPRESSED row formats.
  4. If you specify an index prefix length that is greater than the allowed maximum value, the length is silently reduced to the maximum length. In MySQL 5.6 and later, specifying an index prefix length greater than the maximum length produces an error.

When innodb_large_prefix is enabled, attempting to create an index prefix with a key length greater than 3072 for a REDUNDANT or COMPACT table causes an ER_INDEX_COLUMN_TOO_LONG error.

The InnoDB internal maximum key length is 3500 bytes, but MySQL itself restricts this to 3072 bytes. This limit applies to the length of the combined index key in a multi-column index.

The maximum row length, except for variable-length columns (VARBINARY, VARCHAR, BLOB and TEXT), is slightly less than half of a database page. That is, the maximum row length is about 8000 bytes. LONGBLOB and LONGTEXT columns must be less than 4GB, and the total row length, including BLOB and TEXT columns, must be less than 4GB.

Reference: InnoDB Restrictions