Why is entity é not valid whereas entity &l

2019-08-02 13:14发布

问题:

I'm looking at what the xml resolver System.Xml.Resolvers.XmlPreloadedResolver brings to the table in terms of dtds and i'm stumped by the fact that the entity < is recognized by the xml reader but not the entity é.

    private static void Main(string[] args)
    {
        string invalidContent = "<?xml version=\"1.0\" encoding=\"utf-8\"?><key value=\"char &eacute; invalid\"/>";
        string validContent = "<?xml version=\"1.0\" encoding=\"utf-8\"?><key value=\"char &lt; valid\"/>";

        XmlDocument xmlDocument = new XmlDocument();

        var xmlReaderSettings = new XmlReaderSettings()
        {
            DtdProcessing = DtdProcessing.Parse,
            XmlResolver = new XmlPreloadedResolver(XmlKnownDtds.All),
            ProhibitDtd = false
        };

        using (XmlReader reader = XmlReader.Create(new StringReader(invalidContent), xmlReaderSettings))
        {
            xmlDocument.Load(reader); // reference to undeclared entity 'eacute'
        }

        using (XmlReader reader = XmlReader.Create(new StringReader(validContent), xmlReaderSettings))
        {
            xmlDocument.Load(reader); //
        }
    }

Checking inside the XmlPreloadedResolver i can see that the XmlKnownDtds.All should bring in the xhtml-lat1.ent file which contains the eacute entity, along with many others. Any idea why i'm seeing this behavior?

回答1:

&lt; is a fundamental entity defined in the XML specification itself; &eacute; isn't. That's why you're seeing the difference in behaviour. (So I'd expect &amp;, &gt;, &apos; and &quot; to work too.) See http://www.w3.org/TR/REC-xml/#sec-references

I don't think the XmlResolver is particularly relevant here as your XML doesn't refer to any other DTDs etc. I don't think it's meant to be used to automatically import entities without referring to anything at all within the document itself.



标签: c# dtd xmlreader