XmlWriter inserting spaces when xml:space=preserve

2019-05-05 11:46发布

问题:

Given this code (C#, .NET 3.5 SP1):

var doc = new XmlDocument();
doc.LoadXml("<?xml version=\"1.0\"?><root>"
    + "<value xml:space=\"preserve\">"
    + "<item>content</item>"
    + "<item>content</item>"
    + "</value></root>");

var text = new StringWriter();
var settings = new XmlWriterSettings() { Indent = true, CloseOutput = true };
using (var writer = XmlWriter.Create(text, settings))
{
    doc.DocumentElement.WriteTo(writer);
}

var xml = text.GetStringBuilder().ToString();
Assert.AreEqual("<?xml version=\"1.0\" encoding=\"utf-16\"?>\r\n<root>\r\n"
    + "  <value xml:space=\"preserve\"><item>content</item>"
    + "<item>content</item></value>\r\n</root>", xml);

The assertion fails because the XmlWriter is inserting a newline and indent around the <item> elements, which would seem to contradict the xml:space="preserve" attribute.

I am trying to take input with no whitespace (or only significant whitespace, and already loaded into an XmlDocument) and pretty-print it without adding any whitespace inside elements marked to preserve whitespace (for obvious reasons).

Is this a bug or am I doing something wrong? Is there a better way to achieve what I'm trying to do?

Edit: I should probably add that I do have to use an XmlWriter with Indent=true on the output side. In the "real" code, this is being passed in from outside of my code.

回答1:

Ok, I've found a workaround.

It turns out that XmlWriter does the correct thing if there actually is any whitespace within the xml:space="preserve" block -- it's only when there isn't any that it screws up and adds some. And conveniently, this also works if there are some whitespace nodes, even if they're empty. So the trick that I've come up with is to decorate the document with extra 0-length whitespace in the appropriate places before trying to write it out. The result is exactly what I want: pretty printing everywhere except where whitespace is significant.

The workaround is to change the inner block to:

PreserveWhitespace(doc.DocumentElement);
doc.DocumentElement.WriteTo(writer);

...

private static void PreserveWhitespace(XmlElement root)
{
    var nsmgr = new XmlNamespaceManager(root.OwnerDocument.NameTable);
    foreach (var element in root.SelectNodes("//*[@xml:space='preserve']", nsmgr)
        .OfType<XmlElement>())
    {
        if (element.HasChildNodes && !(element.FirstChild is XmlSignificantWhitespace))
        {
            var whitespace = element.OwnerDocument.CreateSignificantWhitespace("");
            element.InsertBefore(whitespace, element.FirstChild);
        }
    }
}

I'm still thinking that this behaviour of XmlWriter is a bug, though.