Converting Unicode code points to UTF-8

2019-08-01 15:47发布

Currently I have something like this \u4eac\u90fd and I want to convert it to UTF-8 so I can insert it into a database.

3条回答
贪生不怕死
3楼-- · 2019-08-01 16:39

Most likely, the \u escape sequence was already sent by the web browser. This would be the original source of your problem - you need to make the web browser stop doing that.

For that, you need to make sure that the browser knows what encoding to use when submitting the form. The browser will, by default, always use the encoding of the HTML page that contains the form. Make sure that this web page is encoded in UTF-8, and has an UTF-8 charset declaration in a meta header. With that done, the browser should submit UTF-8 data correctly, and you shouldn't need to convert anything at all.

查看更多
爷、活的狠高调
4楼-- · 2019-08-01 16:41
json_decode('"\u4eac\u90fd"');

Credit for using JSON @bobince https://stackoverflow.com/a/7107750 where the reverse is sought (UTF-8 to code points). There ASCII characters will not be converted to code points, but with json_decode, ASCII code points will be converted to characters, e.g. '"\u0041"' -> 'A'.

(Remember that you need the double quotes inside your string. I was confused why json_decode('\u4eac\u90fd'); was giving no output :-)

Note there will be special requirements for 4-byte UTF-8 encodings, where the code point consists of 5 or 6 hexadecimal digits. JSON doesn't use curly braces.

echo json_encode('                                                                    
查看更多
登录 后发表回答