Character encoding fail, why does \xBD display imp

2019-08-09 18:04发布

I'm just trying to understand character encoding a bit better, so I'm doing a few tests.

I have a PHP file that is saved as UTF-8 and looks like this:

<?php
declare(encoding='UTF-8');

header( 'Content-type: text/html; charset=utf-8' );
?><!DOCTYPE html>

<html>

<head>
    <meta charset="UTF-8" />
    <title>Test</title>
</head>

<body>
    <?php echo "\xBD"; # Does not work ?>
    <?php echo htmlentities( "\xBD" ) ; # Works ?>
</body>

</html>

The page itself shows this:

enter image description here

The gist of the problem is that my web application has a bunch of character encoding problems, where people are copying and pasting from Outlook or Word and the characters get transformed into the diamond question marks (Do those have a real name?)

I'm trying to learn how to make sure all my input is transformed into UTF-8 when the page loads (Basically $_GET, $_POST, and $_REQUEST), and all output is done using proper UTF-8 handling methods.

My question is: Why is my page showing the question mark for the first echo, and does anyone have any other information about making a UTF-8 safe web app in PHP?

标签： php utf-8 character-encoding

2条回答

疯言疯语

2楼-- · 2019-08-09 18:50

0xBD is not valid UTF-8. If you want to encode "½" in UTF-8 then you need to use 0xC2 0xBD instead.

>>> print '\xc2\xbd'.decode('utf-8')
½

If you want to use text from another charset (Latin-1 in this case) then you need to transcode it to UTF-8 first using the various iconv or mb functions.

Also:

$ charinfo �
U+FFFD REPLACEMENT CHARACTER

0人赞添加讨论(0) 举报

别忘想泡老子

3楼-- · 2019-08-09 19:04

\xBD is not valid as utf8 what you want is \xC2\xBD, the question mark thing is what applications replace invalid code points with, so if you see that in your utf8 text its either not utf8 or corrupted.

0人赞添加讨论(0) 举报

Character encoding fail, why does \xBD display imp

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间