MySQL returns incorrect UTF8 extended characters i

2019-08-10 17:02发布

Note: In the following question you may see ? or blocks instead of characters, this is because you don't have the appropriate font. Please ignore this.

Background

I have a table with data structured as follows:

CREATE TABLE `decomposition_dup` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `parent` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
 `structure` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
 `child` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
 PRIMARY KEY (`id`),
 KEY `parent` (`parent`),
 KEY `child` (`child`),
 KEY `parent_2` (`parent`,`child`)
) ENGINE=InnoDB AUTO_INCREMENT=211929 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

And some example data:

INSERT INTO `decomposition_dup` (`id`, `parent`, `structure`, `child`) VALUES
(154647, '锦', 'a', '钅'),
(154648, '锦', 'a', '帛'),
(185775, '钅', 'd', '二'),
(185774, '钅', 'd', '㇟'),
(21195, '钅', 'd', '                

2条回答
欢心
2楼-- · 2019-08-10 17:32

If you want them to be the same, set the COLLATION of the columns to utf8mb4_unicode_ci or utf8mb4_unicode_520_ci.
If you want them to be different, use utf8mb4_general_ci, instead:

mysql> SELECT CONVERT(UNHEX('e99285') USING utf8mb4) =
    ->        CONVERT(UNHEX('e2bb90') USING utf8mb4) COLLATE utf8mb4_general_ci AS general;
+---------+
| general |
+---------+
|       0 |
+---------+

mysql> SELECT CONVERT(UNHEX('e99285') USING utf8mb4) =
    ->        CONVERT(UNHEX('e2bb90') USING utf8mb4) COLLATE utf8mb4_unicode_ci AS unicode;
+---------+
| unicode |
+---------+
|       1 |
+---------+

mysql> SELECT CONVERT(UNHEX('e99285') USING utf8mb4) =
    ->        CONVERT(UNHEX('e2bb90') USING utf8mb4) COLLATE utf8mb4_unicode_520_ci AS unicode_520;
+-------------+
| unicode_520 |
+-------------+
|           1 |
+-------------+
查看更多
放荡不羁爱自由
3楼-- · 2019-08-10 17:34

From what I can make out the problem lies within the SQL side of things upon research you'll see that this error code means that

MySQL's utf8 permits only the Unicode characters that can be represented with 3 bytes in UTF-8.

so It might be the characters you are using within the cases of SQL

查看更多
登录 后发表回答