Is there a way to prevent .NET's XmlReader
class from expanding XML entities into their value when reading the content?
For instance, suppose the following XML is used as input:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE author PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN//XML" "http://www.oasis-open.org/docbook/xmlcharent/0.3/iso-lat1.ent" >
<author>á</author>
Let's assume it is not possible to reach the external OASIS DTD needed for the expansion of the aacute entity. I would like the reader to read, in sequence, the author element, then the aacute node of type EntityReference
, and finally the author end element, without throwing any errors. How can I achieve this?
UPDATE: I also want to prevent the expansion of character entities such as á
.
XML parsing is dangerous. In some cases it allows to CVEs and Denial-of-Service attacks.
For example CVE-2016-3255
Also it was disscussed on Black Hat EU 2013
The most interested document is MLDTDEntityAttacks that provides Implementations and Recomendations for developers.
Retrieve resources:
DoS:
Back to your question.
As @Evk wrote: By setting EntityHandling you can prevent from expanding all entities except CharEntities.
I dont know solution to prevent expand CharEntity except your own XmlReader implementation.
I think you also want prevent parsing
& ' < > "
FYI how and where XmlTextReader parses CharEntity
XmlTextReader
ParseElementContent
& case
ParseText
Char entity case
ParseCharRefInline
This function finally parses numeric character entity reference (e.g.
 
andá
)ParseNumericCharRefInline
This function parses named character entity reference (
& ' < > "
)ParseNamedCharRef
One way to do that is use `XmlTextReader', like this:
If that is not an option - you can do the same with
XmlReader
, but some reflection will be required (at least I don't aware of another way):