MySQL unicode literals

2019-02-06 13:23发布

问题:

I want to insert a record into MySQL that has a non-ASCII Unicode character, but I'm on a terminal that doesn't let me easily type non-ASCII characters. How do I escape a Unicode literal in MySQL's SQL syntax?

回答1:

See: http://bugs.mysql.com/bug.php?id=10199 (Bug #10199: "Allow Unicode escape sequence for string literals.") This request has been "Open" since 2005.

From http://eng.kaching.com/2009/10/mysql-unicode-escape-sequences.html though you can see the following example, which does actually seem to work, but requires you to know the actual byte-by-byte UTF8 encoding:

"You can also use the variable-length UTF-8 representation (convenient when, for example, copying from a utf-8 URL-encoded value like %E2%80%98)."

mysql> select _utf8 x'E28098';
+---+
| ‘ |
+---+


回答2:

This stored function provides the functionality MySQL is (apparently) missing, with a way to turn a literal code point into a character without having to already know the UTF-8 encoding.

If VARCHAR(1) seems strange, since utf8 characters in MySQL can be up to 3 bytes long, remember the size of VARCHAR is characters, not bytes. The function returns a single UTF-8-encoded character from the input value.

For hexadecimal literals, prepend 0x.

DELIMITER $$

DROP FUNCTION IF EXISTS `utf8_char` $$
CREATE FUNCTION `utf8_char`(v smallint unsigned) RETURNS VARCHAR(1) CHARSET utf8
NO SQL
DETERMINISTIC
BEGIN

-- http://stackoverflow.com/questions/3632410/mysql-unicode-literals/30675371#30675371

RETURN CHAR(CASE
            WHEN v <= 0x7F THEN v
            WHEN v <= 0x7FF THEN 0xC080 | ((v >> 6) << 8) | (v & 0x3F)
            WHEN v <= 0xFFFF THEN 0xE08080 | (((v >> 12) & 0x0F ) << 16)  | (((v >> 6) & 0x3F ) << 8) | (v & 0x3F)
            ELSE NULL END);

END $$

DELIMITER ;

Example output:

mysql> select utf8_char(8592) AS 'leftwards_arrow';
+-----------------+
| leftwards_arrow |
+-----------------+
| ←               |
+-----------------+
1 row in set (0.00 sec)

mysql> select utf8_char(0x2192) AS 'rightwards_arrow_hex';
+----------------------+
| rightwards_arrow_hex |
+----------------------+
| →                    |
+----------------------+
1 row in set (0.00 sec)