-->

xmlreader newline \\n instead of \\r\\n

2019-02-16 14:53发布

问题:

When I use XmlReader.ReadOuterXml(), elements are separated by \n instead of \r\n. So, for example, if I have XmlDocument representatino of

<A>
<B>
</B>
</A>

I get

<A>\n<B>\n</B>\n</A>

Is there an option to specify newline character? XmlWriterSettings has it but XmlReader doesn't seem to have this.

Here is my code to read xml. Note that XmlWriterSettings by default has NewLineHandling = Replace

XmlDocument xmlDocument = <Generate some XmlDocument>
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;

// Use a memory stream because it accepts UTF8 characters.  If we use a 
// string builder the XML will be UTF16.
using (MemoryStream memStream = new MemoryStream())
{
    using (XmlWriter xmlWriter = XmlWriter.Create(memStream, settings))
    {
        xmlDocument.Save(xmlWriter);
    }

    //Set the pointer back to the beginning of the stream to be read
    memStream.Position = 0;
    using (XmlReader reader = XmlReader.Create(memStream))
    {
        reader.Read();
        string header = reader.Value;
        reader.MoveToContent();
        return "<?xml " + header + " ?>" + Environment.NewLine + reader.ReadOuterXml();
    }
}

回答1:

XmlReader will automatically normalize \r\n\ to \n. Although this seems unusual on Windows, it is actually required by the XML Specification (http://www.w3.org/TR/2008/REC-xml-20081126/#sec-line-ends).

You can do a String.Replace:

string s = reader.ReadOuterXml().Replace("\n", "\r\n");


回答2:

I had to write database data to an xml file and read it back from the xml file, using LINQ to XML. Some fields in a record were themselves xml strings complete with \r characters. These had to remain intact. I spent days trying to find something that would work, but it seems Microsoft was by design converting \r to \n.

The following solution works for me:

To write a loaded XDocument to the XML file keeping \r intact, where xDoc is an XDocument and filePath is a string:

XmlWriterSettings xmlWriterSettings = new XmlWriterSettings 
    { NewLineHandling = NewLineHandling.None, Indent = true };
using (XmlWriter xmlWriter = XmlWriter.Create(filePath, xmlWriterSettings))
{
    xDoc.Save(xmlWriter);
    xmlWriter.Flush();
}

To read an XML file into an XElement keeping \r intact:

using (XmlTextReader xmlTextReader = new XmlTextReader(filePath) 
   { WhitespaceHandling = WhitespaceHandling.Significant })
{
     xmlTextReader.MoveToContent();
     xDatabaseElement = XElement.Load(xmlTextReader);
}


回答3:

Solution 1: Write entitized XML

Use a well configured XmlWriter with NewLineHandling.Entitize option so the XmlReader will not eliminate normalize the line endings.

You can use such a custom XmlWriter even with XDocument:

xDoc.Save(XmlWriter.Create(fileName, new XmlWriterSettings { NewLineHandling = NewLineHandling.Entitize }));

Solution 2: Read non-entitized XML without normalization

Solution 1 is the cleaner way; however, it is possible that you already have the non-entitized XML and you cannot modify the creation and still you want to prevent normalization. The accepted answer suggests a replace but that replaces every \n occurrences blindly even if it is not desirable. To retrieve all of the line endings as they are in the file you can try to use the legacy XmlTextReader class, which does not normalize XML files by default. You can use it with XDocument, too:

var xDoc = XDocument.Load(new XmlTextReader(fileName));


回答4:

There's a quicker way if you're just trying to get to UTF-8. First create a writer:

public class EncodedStringWriter : StringWriter
{
    public EncodedStringWriter(StringBuilder sb, Encoding encoding)
        : base(sb)
    {
        _encoding = encoding;
    }

    private Encoding _encoding;

    public override Encoding Encoding
    {
        get
        {
            return _encoding;
        }
    }

}

Then use it:

XmlDocument doc = new XmlDocument();
doc.LoadXml("<foo><bar /></foo>");

StringBuilder sb = new StringBuilder();
XmlWriterSettings xws = new XmlWriterSettings();
xws.Indent = true;

using( EncodedStringWriter w = new EncodedStringWriter(sb, Encoding.UTF8) )
{
    using( XmlWriter writer = XmlWriter.Create(w, xws) )
    {
        doc.WriteTo(writer);
    }
}
string xml = sb.ToString();

Gotta give credit where credit is due.



回答5:

XmlReader reads files, not writes them. If you are getting \n in your reader it is because that's what's in the file. Both \n and \r are whitespace and are semantically the same in XML, it will not affect the meaning or content of the data.

Edit:

That looks like C#, not Ruby. As binarycoder says, ReadOuterXml is defined to return normalized XML. Typically this is what you want. If you want the raw XML you should use Encoding.UTF8.GetString(memStream.ToArray()), not XmlReader.