Reading from a stream with mixed XML and plain tex

2019-05-23 15:02发布

问题:

I have a text stream that contains segments of both arbitrary plain text and well-formed xml elements. How can I read it and extract the xml elements only? XmlReader with ConformanceLevel set to Fragment still throws an exception when it encounters plain text, which to it is malformed xml.

Any ideas? Thanks

Here's my code so far:

XmlReaderSettings settings = new XmlReaderSettings();
settings.ConformanceLevel = ConformanceLevel.Fragment;

using (XmlReader reader = XmlReader.Create(stream, settings))
    while (!reader.EOF)
    {
        reader.MoveToContent();
        XmlDocument doc = new XmlDocument();
        doc.Load(reader.ReadSubtree());
        reader.ReadEndElement();
    }

Here's a sample stream content and I have no control over it by the way:

Found two objects:
Object a
<object>
    <name>a</name>
    <description></description>
</object>
Object b
<object>
    <name>b</name>
    <description></description>
</object>

回答1:

Provided that this is a hack, if you wrap your mixed document with a "fake" xml root node, you should be able to do what you need getting only the nodes of type element (i.e. skipping the text nodes) among the children of the root element:

using System;
using System.Linq;
using System.Xml;

static class Program {

    static void Main(string[] args) {

        string mixed = @"
Found two objects:
Object a
<object>
    <name>a</name>
    <description></description>
</object>
Object b
<object>
    <name>b</name>
    <description></description>
</object>
";
        string xml = "<FOO>" + mixed + "</FOO>";
        XmlDocument doc = new XmlDocument();
        doc.LoadXml(xml);
        var xmlFragments = from XmlNode node in doc.FirstChild.ChildNodes 
                           where node.NodeType == XmlNodeType.Element 
                           select node;
        foreach (var fragment in xmlFragments) {
            Console.WriteLine(fragment.OuterXml);
        }

    }

}


标签: c# .net xml stream