Single-pass Read and Validate XML vs referenced XS

2019-03-06 13:58发布

问题:

I'm trying to read the data from an XML file, validating it against the XSD it suggests, into a single data structure (such as XmlDocument). I have a solution, but it requires 2 passes through the file, and I'm wondering if there's a single-pass solution.

MyBooks.xml:

<Books xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
     xsi:noNamespaceSchemaLocation='books.xsd' id='999'>
    <Book>Book A</Book>
    <Book>Book B</Book>
</Books>

Books.xsd:

<xs:schema xmlns:xs='http://www.w3.org/2001/XMLSchema'
    elementFormDefault='qualified'
    attributeFormDefault='unqualified'>
    <xs:element name='Books'>
        <xs:complexType>
            <xs:sequence>
                <xs:element name='Book' type='xs:string' />
            </xs:sequence>
            <xs:attribute name='id' type='xs:unsignedShort' use='required' />
        </xs:complexType>
    </xs:element>
</xs:schema>

Let's say MyBooks.xml and Books.xsd are in the same directory.

Validate:

//Given a filename pointing to the XML file
var settings = new XmlReaderSettings();

settings.ValidationType = ValidationType.Schema;

settings.ValidationFlags |= XmlSchemaValidationFlags.ProcessInlineSchema;
settings.ValidationFlags |= XmlSchemaValidationFlags.ProcessSchemaLocation;
settings.ValidationFlags |= XmlSchemaValidationFlags.ReportValidationWarnings;

settings.CloseInput = true;
settings.ValidationEventHandler += new ValidationEventHandler(ValidationCB);
//eg:
//private static void ValidationCB(object sender, ValidationEventArgs args)
//{ throw new ApplicationException(args.Message); }

using(var reader = XmlReader.Create(filename, settings))
{ while(reader.Read()) ; }

Read into XmlDocument:

XmlDocument x = new XmlDocument();
x.Load(filename);

Sure, I could collect the nodes as the read from the XmlReader is taking place, but I'd rather not have to do it myself, if possible. Any suggestion?

Thanks in advance

回答1:

You're very close with your solution; what you need to do is to use a validating reader to load your XML; this way the validation is done with your loading, in one pass; validation errors will not stop you from loading the document.

These are the high level steps that I usually use with a ValidateXml helper function; it all starts with a compiled XmlSchemaSet:

public bool ValidateXml(XmlSchemaSet xset)

I set the reader settings (which you did, too):

XmlReaderSettings settings = new XmlReaderSettings { ValidationType = ValidationType.Schema, Schemas = xset, ConformanceLevel = ConformanceLevel.Document };
settings.ValidationFlags |= XmlSchemaValidationFlags.ReportValidationWarnings;
// Use your helper class that collects validation events. 
XsdUtils.Utils.SmartValidationHandler svh = new XsdUtils.Utils.SmartValidationHandler(Paschi.Xml.DefaultResolver.Instance);
settings.ValidationEventHandler += svh.ValidationCallbackOne;

Then I get a reader:

XmlReader xvr = XmlReader.Create(filename, settings);

Then I read the file, which brings the validation in:

XmlDocument xdoc = new XmlDocument();
xdoc.Load(xvr);

Your validation handler has the results now; one thing I also do is to ensure that the document element that was loaded, actually has a corresponding global element definition in the xml schema set.

XmlQualifiedName qn = XmlQualifiedName.Empty;
if (xdoc.DocumentElement != null)
{
        if (string.IsNullOrEmpty(xdoc.DocumentElement.NamespaceURI))
        {
              qn = new XmlQualifiedName(xdoc.DocumentElement.LocalName);
        }
        else
        {
               qn = new XmlQualifiedName(xdoc.DocumentElement.LocalName, xdoc.DocumentElement.NamespaceURI);
         }
}
return !(svh.HasError || qn.IsEmpty || (!xset.GlobalElements.Contains(qn)));