Force XML character entities into XmlDocument

2019-09-07 03:05发布

问题:

I have some XML that looks like this:

<abc x="{"></abc>

I want to force XmlDocument to use the XML character entities of the brackets, ie:

<abc x="&#123;"></abc>

MSDN says this:

In order to assign an attribute value that contains entity references, the user must create an XmlAttribute node plus any XmlText and XmlEntityReference nodes, build the appropriate subtree and use SetAttributeNode to assign it as the value of an attribute.

CreateEntityReference sounded promising, so I tried this:

XmlDocument doc = new XmlDocument();
doc.LoadXml("<abc />");
XmlAttribute x = doc.CreateAttribute("x");
x.AppendChild(doc.CreateEntityReference("#123"));
doc.DocumentElement.Attributes.Append(x);

And I get the exception Cannot create an 'EntityReference' node with a name starting with '#'.

Any reason why CreateEntityReference doesn't like the '#' - and more importantly how can I get the character entity into XmlDocument's XML? Is it even possible? I'm hoping to avoid string manipulation of the OuterXml...

回答1:

You're mostly out of luck.

First off, what you're dealing with are called Character References, which is why CreateEntityReference fails. The sole reason for a character reference to exist is to provide access to characters that would be illegal in a given context or otherwise difficult to create.

Definition: A character reference refers to a specific character in the ISO/IEC 10646 character set, for example one not directly accessible from available input devices.

(See section 4.1 of the XML spec)

When an XML processor encounters a character reference, if it is referenced in the value of an attribute (that is, if the &#xxx format is used inside an attribute), it is set to "Included" which means its value is looked up and the text is replaced.

The string "ATamp;T;" expands to " AT&T;" and the remaining ampersand is not recognized as an entity-reference delimiter

(See section 4.4 of the XML spec)

This is baked into the XML spec and the Microsoft XML stack is doing what it's required to do: process character references.

The best I can see you doing is to take a peek at these old XML.com articles, one of which uses XSL to disable output escaping so &amp;#123; would turn into &#123; in the output.
http://www.xml.com/pub/a/2001/03/14/trxml10.html

<!DOCTYPE stylesheet [
<!ENTITY ntilde 
"<xsl:text disable-output-escaping='yes'>&amp;ntilde;</xsl:text>">
]>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="1.0">

  <xsl:output doctype-system="testOut.dtd"/>

  <xsl:template match="test">
    <testOut>
      The Spanish word for "Spain" is "Espa&ntilde;a".
      <xsl:apply-templates/>
    </testOut>
  </xsl:template>

</xsl:stylesheet>

And this one which uses XSL to convert specific character references into other text sequences (to accomplish the same goal as the previous link).
http://www.xml.com/lpt/a/1426

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">

  <xsl:output use-character-maps="cm1"/>

  <xsl:character-map name="cm1">
    <xsl:output-character character="&#160;" string="&amp;nbsp;"/>   
    <xsl:output-character character="&#233;" string="&amp;233;"/> <!-- é -->
    <xsl:output-character character="ô" string="&amp;#244;"/>
    <xsl:output-character character="&#8212;" string="--"/>
  </xsl:character-map>

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>


回答2:

You should always manipulate your strings with the preceding @ like so @"My /?.,<> STRING". I don't know if that will solve your issue though. I would approach the problem using XmlNode class from the XmlDocument. You can use the Attributes property and it'll be way easier. Check it out here: http://msdn.microsoft.com/en-us/library/system.xml.xmlnode.attributes.aspx