I'm trying to load parse a Google Weather API response (Chinese response).
Here is the API call.
// This code fails with the following error
$xml = simplexml_load_file('http://www.google.com/ig/api?weather=11791&hl=zh-CN');
( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xB6 0xE0 0xD4 0xC6 in C:\htdocs\weather.php on line 11
Why does loading this response fail?
How do I encode/decode the response so that simplexml
loads it properly?
Edit: Here is the code and output.
<?php
$googleData = file_get_contents('http://www.google.com/ig/api?weather=11102&hl=zh-CN');
$xml = simplexml_load_string($googleData);
( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: Entity: line 1: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xB6 0xE0 0xD4 0xC6 in C:\htdocs\test4.php on line 3 Call Stack Time Memory Function Location 1 0.0020 314264 {main}( ) ..\test4.php:0 2 0.1535 317520 simplexml_load_string ( string(1364) ) ..\test4.php:3
( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: t_system data="SI"/>
( ! ) Warning: simplexml_load_string() [function.simplexml-load-string]: ^ in C:\htdocs\test4.php on line 3 Call Stack Time Memory Function Location 1 0.0020 314264 {main}( ) ..\test4.php:0 2 0.1535 317520 simplexml_load_string ( string(1364) ) ..\test4.php:3
Update: I can reproduce the problem. Also, Firefox is auto-sniffing the character set as "chinese simplified" when I output the raw XML feed. Either the Google feed is serving incorrect data (Chinese Simplified characters instead of UTF-8 ones), or it is serving different data when not fetched in a browser - the content-type header in Firefox clearly says
utf-8
.Converting the incoming feed from Chinese Simplified (GB18030, this is what Firefox gave me) into UTF-8 works:
it doesn't explain nor fix the underlying problem yet, though. I don't have time to take a deep look into this right now, maybe somebody else does. To me, it looks like Google are in fact serving incorrect data (which would surprise me. I didn't know they made mistakes like us mortals. :P)
Just came accross this. This seems to work (the function itself I found on the web, just updated it a bit).:
This is the script I have made in php to parse Google Weather API.
The problem here is that SimpleXML doesn't look at the HTTP header to determine the character encoding used in the document and simply assumes it's UTF-8 even though Google's server does advertise it as
You can write a function that will take a look at that header using the super-secret magic variable
$http_response_header
and transform the response accordingly. Something like that:Try to add in the url query parameter eo = utf-8. In this case, the answer will be exclusively the UTF-8 encoding. It helped me.