Convert utf-8 XML document to utf-16 for inserting

2019-08-12 02:40发布

问题:

I have an XML document that has been created using utf-8 encoding. I want to store that document in a sql 2008 xml column but I understand I need to convert it to utf-16 in order to do that.

I've tried using XDocument to do this but I'm not getting a valid XML result after the conversion. Here is what I've tried to do the conversion on (Utf8StringWriter is a small class that inherits from StringWriter and overloads Encoding):

XDocument xDoc = XDocument.Parse(utf8Xml);
StringWriter writer = new StringWriter();
XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings() 
                { Encoding = writer.Encoding, Indent = true });

xDoc.WriteTo(xml);

string utf16Xml = writer.ToString();

The data in the utf16Xml is invalid and when trying to insert into the database I get the error:

{"XML parsing: line 1, character 38, unable to switch the encoding"}

However the initial utf8Xml data is definitely valid and contains all the info I need.

UPDATE: The initial XML is obtained by using XMLSerializer (with an Utf8StringWriter class) to create the xml string from an existing object model (engine). The code for this is:

public static void Serialise<T>(T engine, ref StringWriter writer)
{
    XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings() { Encoding = writer.Encoding });

    XmlSerializer xs = new XmlSerializer(engine.GetType());

    xs.Serialize(xml, engine);
}

I have to leave this like this as that code is out of my control to change.

Before I even send the utf16Xml string to the failing database call I can view it via the Visual Studio debugger and I notice that the entire string is not present and instead I get a string literal was not closed error on the XML viewer.

回答1:

The error is on first line XDocument xDoc = XDocument.Parse(utf8Xml);. Most likely you converted utf8 stream into a string (utf8xml), but encoding specified in the string is still utf-8, so XML reader fails. If it is true than load XML directly from stream using Load instead of converting it to string first.



回答2:

Set the encoding of the document to UTF-16 after you have parsed it from utf8xml

XDocument xDoc = XDocument.Parse(utf8Xml);
xDoc.Declaration.Encoding = "utf-16";
StringWriter writer = new StringWriter();
XmlWriter xml = XmlWriter.Create(writer, new XmlWriterSettings() 
                { Encoding = writer.Encoding, Indent = true });

xDoc.WriteTo(xml);

string utf16Xml = writer.ToString();


回答3:

Here's what I had to do to make it work. This just converts the XML to utf-16

string getUtf16Xml(System.Xml.XmlDocument xmlDoc)
{    
   System.Xml.Linq.XDocument xDoc = System.Xml.Linq.XDocument.Parse(xmlDoc.OuterXml);
   xDoc.Declaration.Encoding = "utf-16";

   return xDoc.ToString();    
}

Then I can save the results to the DB.



标签: c# xml encoding