Probably a duplicate of unanswered. SQL Server 2008 - Add XML Declaration to XML Output
Please let me know if this is possible. I read in some blogs
http://forums.asp.net/t/1455808.aspx/1
http://www.devnewsgroups.net/group/microsoft.public.sqlserver.xml/topic60022.aspx
But I couldn't understand why I can't do this.
When I read this post I thought it was the "end of the line"... no solution... I almost gave up on the approach... but in fact there is a way to work around this limitation by converting the XML to varchar(max) and then appending the declaration to the beginning of the string. The following post shows how:
Using SQL Server "FOR XML": Convert Result Datatype to Text/varchar/string whatever?
A simple example would look something like this:
The Output:
You have to add it manually. SQL Server always stores xml internally as ucs-2 so it is impossible for SQL to generate it a utf-8 encoding header
See "Limitations of the xml Data Type" on MSDN
The accepted answer of "add it manually", while technically correct, is incomplete and hence misleading. Simply adding the XML declaration with whatever "encoding" you want doesn't change the actual encoding of the string. This is sometimes ok. If you specify "UTF-8" and convert the
XML
data toVARCHAR
, then as long as all of the characters are standard ASCII characters (values 1 - 127), then sure, it's UTF-8 (at least there is no noticeable difference). BUT, if there are any characters with values 128 or above, then you do not have a UTF-8 encoded XML document. And if you convert the XML data toNVARCHAR
, then you have a UTF-16 encoded document, regardless of what you manually specify in the XML declaration. You should only be specifying an encoding IF it is the actual encoding being used.And until SQL Server 2019 (currently in beta at CTP 2.1), there was no way to get the encoding to be UTF-8 within SQL Server, at least not without using SQLCLR. But in SQL Server 2019, you can now convert the XML to actual UTF-8:
That returns:
I have been working with this matter during the last days, and although there might be better solutions, I have ended up quite happy with this bash script:
Basically, this script will get a file generated from the horrible bcp software, which generates incomplete and invalid XML data, convert it from the UCS-2 format to UTF-8 (first row), and add at the beginning and end of the file what it needs (second row of the script) to be valid and complete.
It works for me. The script I used to generate the XML file with BCP is: