I have a simple xml
<data>
<node1>value1</node1>
<node2>value2</node2>
</data>
I'm using IXmlSerializable to read and write such xml with DTOs. The following code works just fine
XmlReader reader;
...
while( reader.Read() ){
Console.Write( reader.ReadElementContentAsString() );
}
// outputs value1value2
However, if whitespaces in the xml are removed, i.e.
<data>
<node1>value1</node1><node2>value2</node2>
</data>
or I use XmlReaderSettings.IgnoreWhitespace = true;
, the code outputs only "value1" ignoring the second node. When I print the nodes that the parser traverses, I can see that ReadElementContentAsString
moves the pointer to the EndElement
of node2
, but I don't understand why that should be happening or how to fix it.
Is it a possible XML parser implementation bug?
===============================================
Here's a sample code and 2 xml samples that produce different results
string homedir = Path.GetDirectoryName(Application.ExecutablePath);
string xml = Path.Combine( homedir, "settings.xml" );
FileStream stream = new FileStream( xml, FileMode.Open );
XmlReaderSettings readerSettings = new XmlReaderSettings();
readerSettings.IgnoreWhitespace = false;
XmlReader reader = XmlTextReader.Create( stream, readerSettings );
while( reader.Read() ){
if ( reader.MoveToContent() == XmlNodeType.Element && reader.Name != "data" ){
System.Diagnostics.Trace.WriteLine(
reader.NodeType
+ " "
+ reader.Name
+ " "
+ reader.ReadElementContentAsString()
);
}
}
stream.Close();
1.) settings.xml
<?xml version="1.0"?>
<data>
<node-1>value1</node-1>
<node-2>value2</node-2>
</data>
2.) settings.xml
<?xml version="1.0"?>
<data>
<node-1>value1</node-1><node-2>value2</node-2>
</data>
using (1) prints
Element node-1 value1
Element node-2 value2
using (2) prints
Element node-1 value1
This is not nearly as robust as Luca's answer, but I've found following pattern useful with reasonable 'predictable' XML (variations in whitespace and values only). Consider:
More generically, in lieu of
reader.ReadElementContent*()
, usereader.Read()
followed byreader.ReadContent*()
.If you want that the XmlReader does not read the whitespaces, you should initialize the XmlReader with the settings as follows:
it works for me in a xml file of the structure you posted:
It happens that
reader.Read()
read the white space character. Ignoring the spaces, the same same instruction read the second element ("gnam" a XML token), indeed bringing the pointer to the node2 element.Debug the
reader
properties before and after the methods called in you example. Check for NodeType and Value properties. Give also a check for MoveToContent method also, it is very useful.Read the documentation of all that methods and properties, and you will end up to learn how XmlReader class works, and how you use it for your purposes. Here is the first google result: it contains a very explicit example.
I ended up to the following (not complete) pattern:
Per the documentation on IgnoreWhitespace, a new line is not considered insignificant.
XmlReaderSettings.IgnoreWhitespace