Error: “Input is not proper UTF-8, indicate encodi

2019-01-04 08:53发布

I'm getting the error:

parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xED 0x6E 0x2C 0x20

When trying to process an XML response using simplexml_load_string from a 3rd party source. The raw XML response does declare the content type:

<?xml version="1.0" encoding="UTF-8"?>

Yet it seems that the XML is not really UTF-8. The langauge of the XML content is Spanish and contain words like Dublín in the XML.

I'm unable to get the 3rd party to sort out their XML.

How can I pre-process the XML and fix the encoding incompatibilities?

Is there a way to detect the correct encoding for a XML file?

10条回答
成全新的幸福
2楼-- · 2019-01-04 09:23

Instead of using javascript, you can simply put this line of code after your mysql_connect sentence:

mysql_set_charset('utf8',$connection);

Cheers.

查看更多
何必那么认真
3楼-- · 2019-01-04 09:23

After several tries i found htmlentities function works.

$value = htmlentities($value)
查看更多
该账号已被封号
4楼-- · 2019-01-04 09:24

Can you open the 3rd party XML source in Firefox and see what it auto-detects as encoding? Maybe they are using plain old ISO-8859-1, UTF-16 or something else.

If they declare it to be UTF-8, though, and serve something else, their feed is clearly broken. Working around such a broken feed feels horrible to me (even though sometimes unavoidable, I know).

If it's a simple case like "UTF-8 versus ISO-8859-1", you can also try your luck with mb_detect_encoding().

查看更多
forever°为你锁心
5楼-- · 2019-01-04 09:25

If you are sure that your xml is encoded in UTF-8 but contains bad characters, ou can use this function to correct them :

$content = iconv('UTF-8', 'UTF-8//IGNORE', $content);
查看更多
爱情/是我丢掉的垃圾
6楼-- · 2019-01-04 09:26

We recently ran into a similar issue and was unable to find anything obvious as the cause. There turned out to be a control character in our string but when we outputted that string to the browser that character was not visible unless we copied the text into an IDE.

We managed to solve our problem thanks to this post and this:

preg_replace('/[\x00-\x1F\x7F]/', '', $input);

查看更多
\"骚年 ilove
7楼-- · 2019-01-04 09:30

When generating mapping files using doctrine I ran into same issue. I fixed it by removing all comments that some fields had in the database.

查看更多
登录 后发表回答