Escaping Unicode string in XmlElement despite writ

2019-02-17 22:24发布

For a given XmlElement, I need to be able to set the inner text to an escaped version of the Unicode string, despite the document ultimately being encoded in UTF-8. Is there any way of achieving this?

Here's a simple version of the code:

const string text = "ñ";

var document = new XmlDocument {PreserveWhitespace = true};
var root = document.CreateElement("root");
root.InnerXml = text;
document.AppendChild(root);

var settings = new XmlWriterSettings {Encoding = Encoding.UTF8, OmitXmlDeclaration = true};
using (var stream = new FileStream("out.xml", FileMode.Create))
using (var writer = XmlWriter.Create(stream, settings))
    document.WriteTo(writer);

Expected:

<root>&#xF1;</root>

Actual:

<root>ñ</root>

Using an XmlWriter directly and calling WriteRaw(text) works, but I only have access to an XmlDocument, and the serialization happens later. On the XmlElement, InnerText escapes the & to &amp;, as expected, and setting Value throws an exception.

Is there some way of setting the inner text of an XmlElement to the escaped ASCII text, regardless of the encoding that is ultimately used? I feel like I must be missing something obvious, or it's just not possible.

1条回答
男人必须洒脱
2楼-- · 2019-02-17 23:04

If you ask XmlWriter to produce ASCII output, it should give you character references for all non-ASCII content.

var settings = new XmlWriterSettings {Encoding = Encoding.ASCII, OmitXmlDeclaration = true};

The output is still valid UTF-8, because ASCII is a subset of UTF-8.

查看更多
登录 后发表回答