PDO query returns lots of \\uXXXX character codes

2020-05-09 22:41发布

问题:

I have a MySQL database table in which I store the names of countries in different languages, and I can't get the data to display in unicode characters - I can only get \uXXXX codes displayed where the special characters should be.

The query is used in an AJAX request with the results encoded as a JSON object.

Here is the table (truncated):

CREATE TABLE IF NOT EXISTS `tbl_countries` (
  `ccode` varchar(2) character set utf8 collate utf8_unicode_ci NOT NULL default '',
  `country_en` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL default '',
  `country_fr` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
  `country_de` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
  `country_es` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
  `country_ru` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
  `country_tr` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
  `country_ar` varchar(100) character set utf8 collate utf8_unicode_ci NOT NULL,
  PRIMARY KEY  (`ccode`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;

--
-- Dumping data for table `tbl_countries`
--

INSERT INTO `tbl_countries` (`ccode`, `country_en`, `country_fr`, `country_de`, `country_es`, `country_ru`, `country_tr`, `country_ar`) VALUES
('AF', 'Afghanistan', 'Afghanistan', 'Afghanistan', 'Afganistán', 'Афганистан', 'Afganistan', 'أفغانستان'),
('AX', 'Aland Islands', 'Îles Åland', 'Alandinseln', 'Islas Åland', 'Аландские острова', 'Aland Adaları', 'جزر أولان'),
('AL', 'Albania', 'Albanie', 'Albanien', 'Albania', 'Албания', 'Arnavutluk', 'ألبانيا'),
('DZ', 'Algeria', 'Algérie', 'Algerien', 'Argelia', 'Алжир', 'Cezayir', 'الجزائر'),
('AS', 'American Samoa', 'Samoa américaines', 'Amerikanisch-Samoa', 'Samoa Americana', 'Американское Самоа', 'Amerikan Samoası', 'ساموا الأمريكية');

This is the code to create the PDO:

$conn = new PDO("mysql:host=$dbhost;dbname=$dbname",
    $dbuser,
    $dbpass,
    array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")
);

$return_arr = array ();

if ($conn) {
    $ac_term = $_GET['term'];
    $query = "SELECT * FROM `tbl_countries` WHERE `country_en` LIKE :term";
    $result = $conn->prepare ($query);
    $result->bindValue (":term", "%".$ac_term."%");
    $result->execute ();

    /* Retrieve and store in array the results of the query.*/
    while ($row = $result->fetch (PDO::FETCH_ASSOC)) {
        $row_array['country_en'] = $row['country_en'];
        $row_array['country_de'] = $row['country_de'];
        $row_array['country_es'] = $row['country_es'];
        $row_array['country_fr'] = $row['country_fr'];
        $row_array['country_ru'] = $row['country_ru'];
        $row_array['country_tr'] = $row['country_tr'];
        $row_array['country_ar'] = $row['country_ar'];
        $row_array['ccode'] = $row['ccode'];
        array_push ($return_arr, $row_array);
    }
}

unset ($conn);

echo json_encode ($return_arr);

At the start of the PHP script is the following line:

header('Content-Type: text/html; charset=utf-8');

This is the typical output I get if I enter the search term united%20king:

[{
   "country_en":"United Kingdom",
   "country_de":"Vereinigtes K\u00f6nigreich",
   "country_es":"Reino Unido",
   "country_fr":"Royaume-Uni",
   "country_ru":"\u0412\u0435\u043b\u0438\u043a\u043e\u0431\u0440\u0438\u0442\u0430\u043d\u0438\u044f",
   "country_tr":"Birle\u015fik Krall\u0131k",
   "country_ar":"\u0627\u0644\u0645\u0645\u0644\u0643\u0629 \u0627\u0644\u0645\u062a\u062d\u062f\u0629",
   "ccode":"GB"
}]

In the PHP code, I tried using htmlentities, but it only showed the special characters for the German output:

$row_array['country_de'] = htmlentities ($row['country_de'], ENT_QUOTES, "UTF-8");

What am I missing? Thanks for reading.

回答1:

It is not PDO but regular behavior of json_encode. In modern PHP versions you can turn it off, but it shouldn't be a problem either way.

I have no idea why you wanted to echo raw json out, but usually it is not intended to be echoed directly into HTML but rather used by some JS code. And JS can sort this encoding out. However, to reduce the amount of data, since 5.4 JSON_UNESCAPED_UNICODE flag can be used, .

Also let me suggest you to debug your code a little before asking a question.
If you want to check PDO's output, do it for PDO, not for json. Verify every step your program does, to find one which spoils the data.



回答2:

That is perfectly valid JSON; these escape sequences are the regular JSON way to encode non-ASCII characters. If you decode it on the client using a regular JSON decoder, you'll end up with the correct characters.