I have inherited a database which contains strings such as:
\u5353\u8d8a\u4e9a\u9a6c\u900a: \u7f51\u4e0a\u8d2d\u7269: \u5728\u7ebf\u9500\u552e\u56fe\u4e66\uff0cDVD\uff0cCD\uff0c\u6570\u7801\uff0c\u73a9\u5177\uff0c\u5bb6\u5c45\uff0c\u5316\u5986
The question is, how do I get this to be displayed properly in an HTML page?
I'm using PHP5 to process the strings.
1) I downloaded and installed a unicode font named CODE2000
2) I wrote this:
<?php header('Content-Type: text/html;charset=utf-8'); ?>
<head></head>
<body style="font-family: CODE2000">
<?php
// I had to remove some strings like ': ', 'DVD', 'CD' to make it in \uXXXX format
$s = '\u5353\u8d8a\u4e9a\u9a6c\u900a\u7f51\u4e0a\u8d2d\u7269\u5728\u7ebf\u9500\u552e\u56fe\u4e66\uff0c\uff0c\uff0c\u6570\u7801\uff0c\u73a9\u5177\uff0c\u5bb6\u5c45\uff0c\u5316\u5986';
$chars = explode('\\u', $s);
foreach ($chars as $char) {
$c = iconv('utf-16', 'utf-8', hex2str($char));
print $c;
}
function hex2str($hex) {
$r = '';
for ($i = 0; $i < strlen($hex) - 1; $i += 2)
$r .= chr(hexdec($hex[$i] . $hex[$i + 1]));
return $r;
}
?>
</body>
</html>
3) It produced this characters http://img267.imageshack.us/img267/9759/49139858.png which could be correct. E.g. the 1st character (5353) is indeed this while the 2nd one (8d8a) is this. Of course I cannot be 100% sure but it seems to fit. Maybe you can take it from here.
That was a good exercise :)
PHP < 6 is woefully unaware of Unicode, so you have to do everything yourself:
Option 1. takes precedence over 2. I'm not sure where 3. fits in.
If you need to do any string processing prior to displaying the data, make sure you use the multibyte (mb_*) string functions. If you have Unicode data coming from other sources in other encodings, you'll need to use mb_convert_encoding.
Based on daremon's submission, here is a "unicode_decode" function which will convert \uXXXX into their UTF counterparts.
function unicode_decode($str){
return preg_replace("/\\\u([0-9A-F]{4})/ie", "iconv('utf-16', 'utf-8', hex2str(\"$1\"))", $str);
}
function hex2str($hex) {
$r = '';
for ($i = 0; $i < strlen($hex) - 1; $i += 2)
$r .= chr(hexdec($hex[$i] . $hex[$i + 1]));
return $r;
}