
Why is the XmlWriter always outputting utf-16 encoding?

2019-02-21 09:38发布


I have this extension method

    public static string SerializeObject<T>(this T value)
        var serializer = new XmlSerializer(typeof(T));           
        var settings = new XmlWriterSettings
                        Encoding = new UTF8Encoding(true), 
                        Indent = false, 
                        OmitXmlDeclaration = false,
                        NewLineHandling = NewLineHandling.None

        using(var stringWriter = new StringWriter()) 
            using(var xmlWriter = XmlWriter.Create(stringWriter, settings)) 
                serializer.Serialize(xmlWriter, value);

            return stringWriter.ToString();

but whenever I call this it has an encoding of utf-16 specified, ie <?xml version="1.0" encoding="utf-16"?>. What am I doing wrong?


Strings are UTF-16, so writing to a StringWriter will always use UTF-16. If that's not what you want, then use some other TextWriter derived class, with the encoding you like.


As far as I know, StringWriter class will always use UTF 16 encoding when serializing to string. You can write your own override class that accepts a different encoding:

public class StringWriterWithEncoding : StringWriter
    private readonly Encoding _encoding;

    public StringWriterWithEncoding()

    public StringWriterWithEncoding(IFormatProvider formatProvider)
        : base(formatProvider)

    public StringWriterWithEncoding(StringBuilder sb)
        : base(sb)

    public StringWriterWithEncoding(StringBuilder sb, IFormatProvider formatProvider)
        : base(sb, formatProvider)

    public StringWriterWithEncoding(Encoding encoding)
        _encoding = encoding;

    public StringWriterWithEncoding(IFormatProvider formatProvider, Encoding encoding)
        : base(formatProvider)
        _encoding = encoding;

    public StringWriterWithEncoding(StringBuilder sb, Encoding encoding)
        : base(sb)
        _encoding = encoding;

    public StringWriterWithEncoding(StringBuilder sb, IFormatProvider formatProvider, Encoding encoding)
        : base(sb, formatProvider)
        _encoding = encoding;

    public override Encoding Encoding
        get { return (null == _encoding) ? base.Encoding : _encoding; }

So you can use this instead:

using(var stringWriter = new StringWriterWithEncoding( Encoding.UTF8))


You should derive a new class from StringWriter which has an overriden encoding property.


If you do not want to use a class that derives from StringWriter, then in your case, you could simply set the OmitXmlDeclaration to false and declare your own, just as I do below:

 public static string Serialize<T>(this T value, string xmlDeclaration = "<?xml version=\"1.0\"?>") where T : class, new()
            if (value == null) return string.Empty;

            using (var stringWriter = new StringWriter())
                var settings = new XmlWriterSettings
                    Indent = true,
                    OmitXmlDeclaration = xmlDeclaration != null,

                using (var xmlWriter = XmlWriter.Create(stringWriter, settings))
                    var xmlSerializer = new XmlSerializer(typeof(T));

                    xmlSerializer.Serialize(xmlWriter, value);

                    var sb = new StringBuilder($"{Environment.NewLine}{stringWriter}");

                    sb.Insert(0, xmlDeclaration);

                    return sb.ToString();


As the accepted answer says, StringWriter is UTF-16 (Unicode) by default and design. If you want to do it by getting a UTF-8 string in the end, there are 2 ways I can give you to get it done:

Solution #1 (not very efficient, bad practice, but gets the job done): Dump it to a text file and read it back in, delete the file (probably only suitable for small files, if you even want to do this at all - just wanted to show it could be done!)

public static string SerializeObject<T>(this T value)
    var serializer = new XmlSerializer(typeof(T));           
    var settings = new XmlWriterSettings
                    Encoding = new UTF8Encoding(true), 
                    Indent = false, 
                    OmitXmlDeclaration = false,
                    NewLineHandling = NewLineHandling.None

    using(var xmlWriter = XmlWriter.Create("MyFile.xml", settings)) 
        serializer.Serialize(xmlWriter, value);

    XmlDocument xml = new XmlDocument();
    byte[] bytes = Encoding.UTF8.GetBytes(xml.OuterXml);        

    return Encoding.UTF8.GetString(bytes);


Solution #2 (better, easier, more elegant solution!): Do it like you have it, using StringWriter, but use its Encoding property to set it to UTF-8:

public static string SerializeObject<T>(this T value)
    var serializer = new XmlSerializer(typeof(T));           
    var settings = new XmlWriterSettings
                    Encoding = new UTF8Encoding(true), 
                    Indent = false, 
                    OmitXmlDeclaration = false,
                    NewLineHandling = NewLineHandling.None

    using(var stringWriter = new UTF8StringWriter())
        using(var xmlWriter = XmlWriter.Create(stringWriter, settings)) 
            serializer.Serialize(xmlWriter, value);

        return stringWriter.ToString();

public class UTF8StringWriter : StringWriter
    public override Encoding Encoding
            return Encoding.UTF8;