UTF-8 problems with PHP DOM on Debian server

2019-09-15 18:12发布

I have a problem with UTF-8 strings in PHP on my Debian server.

Update in details

I´ve done a little more testing and the situation is now more specific. I updated the title and details to fit it better the situation. Thanks for the responses and sorry that the problem wasn´t described clearly. The following script works fine on my local Windows machine but not on my Debian server:

<?php
header("Content-Type: text/html; charset=UTF-8");
$string = '<html><head></head><body>UTF-8: ÄÖÜ<br /></body</html>';
$document = new DOMDocument();
@$document->loadHTML($string);
echo $document->saveHTML();
echo $string;

As expected on my local machine the output is:

UTF-8: ÄÖÜ
UTF-8: ÄÖÜ

On my server the output is:

UTF-8: ÄÖÜ
UTF-8: ÄÖÜ

I wrote the script in Notepad++ in UTF-8 without BOM and transferred it over SSH. As noticed by guido the string itself is properly UTF-8 encoded. There seems to be a problem with PHP DOM or maybe libxml. And the reason must be some setting since it is machine dependant.

Original question

I work locally with XAMPP on Windows and everything is fine. But when I deploy my project on the server UTF-8 strings get all messed up. In fact when I upload this test script

echo utf8_encode('UTF-8 test: ÄÖÜ');

I get "ÃÃÃ". Also when I connect with putty to the server I cannot write umlauts (ÄÖÜ) correctly in the shell. I have no idea if this issue is even PHP related.

标签: php linux dom
6条回答
手持菜刀,她持情操
2楼-- · 2019-09-15 18:12

EDIT: answer for updated question:

<?php
header("Content-Type: text/html; charset=UTF-8");
$string = '<html><head>'
.'<meta http-equiv="content-type" content="text/html; charset=utf-8">'
.'</head><body>UTF-8: ÄÖÜ<br /></body</html>';
$document = new DOMDocument();
@$document->loadHTML($string);
echo $document->saveHTML();
echo $string;
?>

I suspect your input string may be already UTF-8. Try:

setlocale(LC_CTYPE, 'de_DE.UTF-8');
$s = "UTF-8 test: ÄÖÜ";
if (mb_detect_encoding($s, "UTF-8") == "UTF-8") {
    echo "No need to encode";
} else {
    $s = utf8_encode($s);
    echo "Encoded string $s";
}
查看更多
一纸荒年 Trace。
3楼-- · 2019-09-15 18:12

Try changing the defualt charset on the server in your php.ini file:

default_charset = "UTF-8"

also, make sure your are sending out the proper content type headers as utf-8

In my experience with utf-8, if you properly configure the php mbstring module and use the mbstring functions, and also make sure your database connection is using utf-8 then you won't have any problems.

The db part can be done for mysql with the query "SET NAMES 'utf8'"

I usually started an output buffer using mbstring to handle the buffer. This is what I use in production websites and it is a very solid approach. Then send the buffer when you have finished rendering your content.

Let me know if you would like the sampe code for that.

Another easy trick to just see if it is the wrong headers being sent out by php or the webserver is to use the view->encoding menu on your browser and see if it is utf-8. If it's not and you switch it to utf-8 and everything looks ok then it is a problem with your headers or content type. If it is already utf-8 and the text is screwed up then it is something going wrong in your code or db connection. If you are using mysql make sure the tables and columns involved are also utf-8

查看更多
兄弟一词,经得起流年.
4楼-- · 2019-09-15 18:15

The cause of the problem was an old version of libxml (2.6.32.) on the server. On the development machine it was 2.7.3. I upgraded libxml to an unstable package resulting in version 2.7.8. The problems are now gone.

查看更多
Summer. ? 凉城
5楼-- · 2019-09-15 18:16

Are you explicitly sending a content-type header? If you omit it, it's likely that Apache is sending one for you. If the file is served with a Latin-1 encoding (by Apache) and the browser reads it as such, then your UTF-8 characters will be malformed.

Try this:

<?php
echo "Drop some UTF-8 characters here.";

Then this:

<?php
header("Content-Type: text/html; charset=UTF-8");
echo "Drop some UTF-8 characters here.";

The second should work, if the first doesn't. You may also want to save the file as a UTF-8-encoded file, if it's not already.

If your database characters are messed up, try setting the (My)SQL connection encoding.

查看更多
Luminary・发光体
6楼-- · 2019-09-15 18:17

Check for your apache's AddDefaultCharset setting.

On standard debian apache distributions, the setting can be modified in /etc/apache2/conf.d/charset.

查看更多
手持菜刀,她持情操
7楼-- · 2019-09-15 18:20

Please verify that your file is byte-to-byte the same as on your local machine. FTP transfer in text mode could have messed it up. You may want to try binary one.

查看更多
登录 后发表回答