Converting Unicode code points to UTF-8

2019-08-01 16:06发布

问题:

Currently I have something like this \u4eac\u90fd and I want to convert it to UTF-8 so I can insert it into a database.

回答1:

Most likely, the \u escape sequence was already sent by the web browser. This would be the original source of your problem - you need to make the web browser stop doing that.

For that, you need to make sure that the browser knows what encoding to use when submitting the form. The browser will, by default, always use the encoding of the HTML page that contains the form. Make sure that this web page is encoded in UTF-8, and has an UTF-8 charset declaration in a meta header. With that done, the browser should submit UTF-8 data correctly, and you shouldn't need to convert anything at all.



回答2:

http://hsivonen.iki.fi/php-utf8/



回答3:

json_decode('"\u4eac\u90fd"');

Credit for using JSON @bobince https://stackoverflow.com/a/7107750 where the reverse is sought (UTF-8 to code points). There ASCII characters will not be converted to code points, but with json_decode, ASCII code points will be converted to characters, e.g. '"\u0041"' -> 'A'.

(Remember that you need the double quotes inside your string. I was confused why json_decode('\u4eac\u90fd'); was giving no output :-)

Note there will be special requirements for 4-byte UTF-8 encodings, where the code point consists of 5 or 6 hexadecimal digits. JSON doesn't use curly braces.

echo json_encode('