I receive an XML file with encoding "ISO-8859-1" (Latin-1)
Within the file (among other tags) I have <OtherText>Example "content" And ─</OtherText>
Now for some reason when I load this into XMLTextReader and do a "XmlReader.Value" to return the value, it returns: "content" And ─
This then, when confronted with a database only accepting Latin-1 encoding, obviously errors.
I have tried the following:
- Converting into bytes and using Encoding.Convert to change from UTF-8 into Latin-1 (which successfully gives me a bunch of "?" instead)
- Using StreamReader(file,Encoding.whatever) to load the file into XmlTextReader
And several variations there-of and different methods on the internet and on StackOverflow istelf.
I understand that .NET strings are UTF-16, but what I don't understand is why, a fully Latin-1 formatted XML file with CORRECT markup for when UTF-8 characters exist which is compatible with older databases AND the web (for HTML markup etc) that it simply overrides that and output's the UTF-8 encoded string ANYWAY.
Is there noway to get around this other than writing my own custom text parser???
I do not believe this is a problem with the encoding. What you're seeing is the XML string being un-escaped.
The problem is
"
is a XML escape character, so XMLTextReader will un-escape this for you.If you change this:
To this:
Then
You'll need to wrap your value in CDATA so it is ignored by the parser.
Another option is to re-escape the string: