Ignore whitespace while reading XML

2019-02-26 11:50发布

问题:

I have an XML format with following format

<Tag>
    Value
</Tag>

This comes from an external datasource I cannot change. When using XmlReader the content has Linebreaks and Whitepace.

XmlReaderSettings xmlSettings = new XmlReaderSettings();
xmlSettings.Schemas = new System.Xml.Schema.XmlSchemaSet();
XmlReader schemaReader = XmlReader.Create(xsdStream);
xmlSettings.Schemas.Add("", schemaReader);
xmlSettings.ValidationType = ValidationType.Schema;
reader = XmlReader.Create(xmlFilename, xmlSettings);
// Parse the XML file.
while (reader.Read())
{
    if (reader.IsStartElement())
    {
         switch (reader.Name)
         {
             case "Tag":
                 string value = reader.ReadElementContentAsString();
                 Console.WriteLine(value);
                 break; 
          }
     }
}

How can I avoid this?

回答1:

Not working answer

This answer doesn't seem to work, but I'm leaving it for the moment to avoid anyone else suggesting it. I'll delete this if someone posts a better answer.

Did you try setting XmlReaderSettings.IgnoreWhitespace?

White space that is not considered to be significant includes spaces, tabs, and blank lines used to set apart the markup for greater readability. An example of this is white space in element content.

For some reason this doesn't affect ReadElementContentAsString or even the Value property of a text node.

Simple answer

You could just call Trim:

string value = reader.ReadElementContentAsString().Trim();

That won't remove line breaks between contentful lines, of course... if you need to do that, you could always use string.Replace.

(As I mentioned in the comment, I'd personally prefer using LINQ to XML than XmlReader unless you're genuinely reading something too large to fit in memory, but that's a separate matter.)