Php/json: decode utf8?

2019-08-02 19:34发布

问题:

I store a json string that contains some (chinese ?) characters in a mysql database. Example of what's in the database:

normal.text.\u8bf1\u60d1.rest.of.text

On my PHP page I just do a json_decode of what I receive from mysql, but it doesn't display right, it shows things like "½±è§�"

I've tried to execute the "SET NAMES 'utf8'" query at the beginning of my file, didn't change anything. I already have the following header on my webpage:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

And of course all my php files are encoded in UTF-8.

Do you have any idea how to display these "\uXXXX" characters nicely?

回答1:

Unicode is not UTF-8!

$ echo -en '\x8b\xf1\x60\xd1\x00\n' | iconv -f unicodebig -t utf-8
诱惑

This is a strange "encoding" you have. I guess each character of the normal text is "one byte" long (US-ASCII)? Then you have to extract the \u.... sequences, convert the sequence in a "two byte" character and convert that character with iconv("unicodebig", "utf-8", $character) to an UTF-8 character (see iconv in the PHP-documentation). This worked on my side:

$in = "normal.text.\u8bf1\u60d1.rest.of.text";

function ewchar_to_utf8($matches) {
    $ewchar = $matches[1];
    $binwchar = hexdec($ewchar);
    $wchar = chr(($binwchar >> 8) & 0xFF) . chr(($binwchar) & 0xFF);
    return iconv("unicodebig", "utf-8", $wchar);
}

function special_unicode_to_utf8($str) {
    return preg_replace_callback("/\\\u([[:xdigit:]]{4})/i", "ewchar_to_utf8", $str);
}

echo special_unicode_to_utf8($in);

Otherwise we need more Information on how your string in the database is encoded.



回答2:

This seems to work fine for me, with PHP 5.3.5 on Ubuntu 11.04:

<?php
header('Content-Type: text/plain; charset="UTF-8"');
$json = '[ "normal.text.\u8bf1\u60d1.rest.of.text" ]';

$decoded = json_decode($json, true);

var_dump($decoded);

Outputs this:

array(1) {
  [0]=>
  string(31) "normal.text.诱惑.rest.of.text"
}


回答3:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

That's a red herring. If you serve your page over http, and the response contains a Content-Type header, then the meta tag will be ignored. By default, PHP will set such a header, if you don't do it explicitly. And the default is set as iso-8859-1.

Try with this line:

<?php
header("Content-Type: text/html; charset=UTF-8");