TSQL FOR XML Exporting ACTUAL UTF-8 Encoded data

2019-08-11 23:25发布

问题:

We have a process that writes XML (Using SQL's FOR XML). When it was executed via SQLCMD in a batch file the output was in UTF-8 format (specifically 8-bit ascii characters become 2 byte). When I do the same thing through Execute SQL Command in SSIS it's not UTF-8 encoded.

Here's a simple example. The ® should become 2 bytes:

SELECT 'Diversity® Certified' as fldAgentLastName
FOR XML PATH('agent'), ELEMENTS, TYPE, ROOT('agents')

The output is: Diversity® Certified

it SHOULD be: Diversity® Certified

and was using SQLCMD. I understand that internally XML is stored as UCS-2(?), but I need a way to get the output as UTF-8 encoded data (not just 8-bit).

I also cannot use the BCP trick I've seen mentioned.

I don't want to use the CDATA tag because that would entail recreating a giant ugly query.

Everything I've found on the web doesn't encode the high ascii characters.

This is running on SQL Server 2008 R2.

回答1:

I guess all I had to do was ask, and then I'd find my own answer: The problem wasn't with SQL, it was downstream in SSIS. I changed the codepage for a CONVERT as well as the final Text file to 65001 based on this answer in another thread:

...The workaround is simple albeit counterintuitive - add a Data Conversion Transformation step between the OLE DB Source and the Flat-File Destination that converts your input "Dat" column from DT_NTEXT to DT_TEXT with a codepage of 65001. Then you feed the newly transformed column directly to the output column in your flat-file dest. ... Regards, Jacob

http://www.sqlservercentral.com/Forums/Topic719421-149-1.aspx