We have a process that writes XML (Using SQL's FOR XML). When it was executed via SQLCMD in a batch file the output was in UTF-8 format (specifically 8-bit ascii characters become 2 byte). When I do the same thing through Execute SQL Command in SSIS it's not UTF-8 encoded.
Here's a simple example. The ® should become 2 bytes:
SELECT 'Diversity® Certified' as fldAgentLastName
FOR XML PATH('agent'), ELEMENTS, TYPE, ROOT('agents')
The output is: Diversity® Certified
it SHOULD be: Diversity® Certified
and was using SQLCMD. I understand that internally XML is stored as UCS-2(?), but I need a way to get the output as UTF-8 encoded data (not just 8-bit).
I also cannot use the BCP trick I've seen mentioned.
I don't want to use the CDATA tag because that would entail recreating a giant ugly query.
Everything I've found on the web doesn't encode the high ascii characters.
This is running on SQL Server 2008 R2.
I guess all I had to do was ask, and then I'd find my own answer: The problem wasn't with SQL, it was downstream in SSIS. I changed the codepage for a CONVERT as well as the final Text file to 65001 based on this answer in another thread:
...The workaround is simple albeit counterintuitive - add a Data Conversion Transformation step between the OLE DB Source and the Flat-File Destination that converts your input "Dat" column from DT_NTEXT to DT_TEXT with a codepage of 65001. Then you feed the newly transformed column directly to the output column in your flat-file dest. ... Regards, Jacob
http://www.sqlservercentral.com/Forums/Topic719421-149-1.aspx