Parsing UTF8 encoded data from a Web Service

2019-01-29 07:08发布

问题:

I'm parsing the date from http://toutankharton.com/ws/localisations.php?l=75

As you can see, it's encoded (<name>Paris 2ème</name>).

My code is the following :

using (var reader = new StreamReader(stream, Encoding.UTF8))
            {
                var contents = reader.ReadToEnd();

                XElement cities = XElement.Parse(contents);

                    var t = from city in cities.Descendants("city")
                                                    select new City
                                                    {
                                                        Name = city.Element("name").Value,
                                                        Insee = city.Element("ci").Value,
                                                        Code = city.Element("code").Value,
                                                    };
            }

Isn't new StreamReader(stream, Encoding.UTF8) sufficient ?

回答1:

That looks like something that happens if you take utf8-bytes and output them with a incompatible encoding like ISO8859-1. Do you know what the real character is? Going back, using ISO8859-1 to get a byte array, and UTF8 to read it, gives "è".

var input = "è";
var bytes = Encoding.GetEncoding("ISO8859-1").GetBytes(input);
var realString = Encoding.UTF8.GetString(bytes);


标签: c# encoding