Strange Characters In XML Response From Google Wea

2019-07-25 07:28发布

I've just launched a small application i've been working on. Nothing major, but something I would like to get properly working. It's at www.wedrapp.com.

Most of the time it works perfectly fine. Enter a city, XML is returned, parsed and the data returned is shown to the user.

Unfortunately however, an error is returned when certain cities are searched such as Marseille. If you search Marseille you will see what I mean. I have a feeling it is to do with special characters, as Marseille searched actually returns Marseilles, Provence-Alpes-Côte d'Azur in the XML. Similarly Paris gives an error as it actually returns Paris, Île-de-France.

Can anyone shed some light on how to strip these strange characters out, or at least stop them providing an error before hitting the screen? It is XML parsed with PHP.

1条回答
老娘就宠你
2楼-- · 2019-07-25 07:56

Find out in which encoding the XML returned by google is. Then re-encode it from that encoding to UTF-8, then you can load the XML with SimpleXML.

The Google Weather API XML has an encoding based on the language that is specified when it's requested (It is possible to specify the encoding you want to have as well, I come to that soon).

For example, it can be ISO-8859-2 as a related question PHP XML — Google Weather API - parsing and modifying data (Language, UTF-8, and F to Celsius) shows.

You can find out which one by looking into the HTTP Response Header Content-Type:

Content-Type: text/xml; charset=ISO-8859-1

You used utf8_encodeDocs to change the encoding, it converts a ISO-8859-1 (also referred to as Latin-1) encoded string to UTF-8. It looks like that standard queries to the secret google weather API return this by default.

You can specify the encoding you'd like to have by adding a oe parameter to the query. For example to get it directly as UTF-8:

http://www.google.com/ig/api?weather=Mountain+View&oe=utf-8
                                                   ^

Doing this will ensure you always get a specific encoding instead that you need to guess or to parse response headers.

查看更多
登录 后发表回答