I have an XML document file.xml
which is encoded in Iso-latin-15 (aka Iso-Latin-9)
<?xml version="1.0" encoding="iso-8859-15"?>
<root xmlns="http://stackoverflow.com/demo">
<f>€.txt</f>
</root>
From my favorite text editor, I can tell this file is correctly encoded in Iso-Latin-15 (it is not UTF-8).
My software is written in C# and wants to extract the element f
.
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("file.xml");
In real life, I have a XMLResolver to set credentials. But basically, my code is as simple as that. The loading goes smoothly, I don't have any exception raised.
Now, my problem when I extract the value:
//xnsm is the XmlNameSpace manager
XmlNode n = xmlDoc.SelectSingleNode("//root/f", xnsm);
if (n != null)
String filename = n.InnerText;
The Visual Studio debugger displays filename = □.txt
It could only be a Visual Studio bug. Unfortunately File.Exists(filename)
returns false, whereas the file actually exist.
What's wrong?
Don't just use the debugger or the console to display the string as a string.
Instead, dump the contents of the string, one character at a time. For example:
That will show you the real contents of the string, in terms of Unicode code points, instead of being constrained by what the current font can display.
Use the Unicode code charts to look up the characters specified.
Does your xml define its encoding correctly ? encoding="iso-8859-15" .. is that Iso-latin-15
Ideally, you should put your content inside a CDATA element .. so the xml would look like
<f><![CDATA[€.txt]]></f>
Ideally, you should also escape all special characters with equivalent url-encoded (or http-encoded) values, because xml typically is for communicating through http.
I dont know the exact escape code for € .. but it would be something of this sort
The above should make € be communicated correctly through the xml.
If I remember correctly the
XmlDocument.Load(string)
method always assumes UTF-8, regardless of the XML encoding.You would have to create a
StreamReader
with the correct encoding and use that as the parameter.EDIT:
I just stumbled across KB308061 from Microsoft. There's an interesting passage: