PHP/MySQL with encoding problems

2019-01-11 15:36发布

问题:

I am having trouble with PHP regarding encoding.

I have a JavaScript/jQuery HTML5 page interact with my PHP script using $.post. However, PHP is facing a weird problem, probably related to encoding.

When I write

htmlentities("í")

I expect PHP to output í. However, instead it outputs í At the beginning, I thought that I was making some mistake with the encodings, however

htmlentities("í")=="í"?"Good":"Fail";

is outputing "Fail", where

htmlentities("í")=="í"?"Good":"Fail";

But htmlentities($search, null, "utf-8") works as expected.

I want to have PHP communicate with a MySQL server, but it has encoding problems too, even if I use utf8_encode. What should I do?

EDIT: On the SQL command, writing

SELECT id,uid,type,value FROM users,profile
WHERE uid=id AND type='name' AND value='XXX';

where XXX contains no í chars, works as expected, but it does not if there is any 'í' char.

SET NAMES 'utf8';
SET CHARACTER SET 'utf8';
SELECT id,uid,type,value FROM users,profile
WHERE uid=id AND type='name' AND value='XXX';

Not only fails for í chars, but it ALSO fails for strings without any 'special' characters. Removing the ' chars from SET NAMES and SET CHARACTER SET doesn't seem to change anything.

I am connecting to the MySQL database using PDO.

EDIT 2: I am using MySQL version 5.1.30 of XAMPP for Linux.

EDIT 3: Running SHOW VARIABLES LIKE '%character%' from PhpMyAdmin outputs

character_set_client    utf8
character_set_connection    utf8
character_set_database  latin1
character_set_filesystem    binary
character_set_results   utf8
character_set_server    latin1
character_set_system    utf8
character_sets_dir  /opt/lampp/share/mysql/charsets/

Running the same query from my PHP script(with print_r) outputs:

Array
(
    [0] => Array
        (
            [Variable_name] => character_set_client
            [0] => character_set_client
            [Value] => latin1
            [1] => latin1
        )

    [1] => Array
        (
            [Variable_name] => character_set_connection
            [0] => character_set_connection
            [Value] => latin1
            [1] => latin1
        )

    [2] => Array
        (
            [Variable_name] => character_set_database
            [0] => character_set_database
            [Value] => latin1
            [1] => latin1
        )

    [3] => Array
        (
            [Variable_name] => character_set_filesystem
            [0] => character_set_filesystem
            [Value] => binary
            [1] => binary
        )

    [4] => Array
        (
            [Variable_name] => character_set_results
            [0] => character_set_results
            [Value] => latin1
            [1] => latin1
        )

    [5] => Array
        (
            [Variable_name] => character_set_server
            [0] => character_set_server
            [Value] => latin1
            [1] => latin1
        )

    [6] => Array
        (
            [Variable_name] => character_set_system
            [0] => character_set_system
            [Value] => utf8
            [1] => utf8
        )

    [7] => Array
        (
            [Variable_name] => character_sets_dir
            [0] => character_sets_dir
            [Value] => /opt/lampp/share/mysql/charsets/
            [1] => /opt/lampp/share/mysql/charsets/
        )

)

Running

SET NAMES 'utf8';
SET CHARACTER SET 'utf8';
SHOW VARIABLES LIKE '%character%'

outputs an empty array.

回答1:

It's very important to specify the encoding of htmlentities to match that of the input, as you did in your final example but omitted in the first three.

htmlentities($text,ENT_COMPAT,'utf-8');

Regarding communications with MySQL, you need to make sure the connection collation and character set matches the data you are transmitting. You can either set this in the configuration file, or at runtime using the following queries:

SET NAMES utf8;
SET CHARACTER SET utf8;

Make sure the table, database and server character sets match as well. There is one setting you can't change at run-time, and that's the server's character set. You need to modify it in the configuration file:

[mysqld]
character-set-server = utf8
default-character-set = utf8 
skip-character-set-client-handshake

Read more on characters sets and collations in MySQL in the manual.



回答2:

Late revival. But for further reference here are some extra tips:

  1. Use mysql_set_charset instead of SET xxx
  2. Make sure you are saving the file with UTF-8 encoding (this is often overlooked)
  3. Set headers:
    <?php header("Content-type: text/html; charset=utf-8"); ?>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

  4. If your Apache server configuration contains a AddDefaultCharset directive with a different encoding go yell at your host administrator.


回答3:

I just ran into this issue. I have a whole website's content in Spanish, with all the special characters you can expect (áéíóúñ) and their capital letter versions.

In my case it was an inconsistency with the server charset/collation. Everything else was set to utf8, but the server charset, which had latin1. This caused all utf8 data entered in the database to display in its raw encoded form, likeL í would equal an A with tilde ~ ...

I am using mysqli, and to fix it, I made use of the method explained above by Anthony Accioly (using mysql_set_charset). Said method has a mysqli version and that is what I used.

After that, I was puzzled. I still had a mess when viewing my website. Of course, I didn't know that by changing that latin1 to utf8 I would also mess up the character encode/decode of the whole thing. So I used the help of an online string encoder/decoder to fix my table data.

I made various exports of all my content data (you can set them up to get update queries and that will be faster for your update process) and ran the sql output through the afore mentioned online encoder/decoder, then copy pasted the fixed queries on phpmyadmin sql panel... thus fixing my encoding errors. Everything is now how it should be, AND I am able to process lossy searches again: Maria, maria, maría, mariá will all match maría, maria, Maria, etc. All acute characters evaluate to their base vowel character. Epic Win.