bcp queryout xml format file encoding utf-8 xml fi

2020-05-06 13:03发布

问题:

I have to generate xml file for the format of encoding="UTF-8. I used bcp queryout. xml file was generating. But my issue was the xml file validation fail. could any one help me to resolve this? This is my code.

 DECLARE @xmlBody AS VARCHAR(MAX)= '<?xml version="1.0" encoding="UTF-8"?>'+ CAST(@xmlStr  AS VARCHAR(MAX)) 


       INSERT INTO  [dbo].[MasterXml]   
           ([PurchaseOrderID] ,[Code])          
        values (@PurchaseOrderID, @xmlBody)                         
        Declare @command varchar(8000)          

        SET @command= 'bcp "SELECT TOP 1 [Code] from  [tec_Dev].[dbo].[MasterXml] where PurchaseOrderID='+          
        CAST( @PurchaseOrderID As varchar(20))+'" queryout '            
         +@uploadFolder + CAST(@PurchaseOrderID AS varchar(20))+'.xml' +' -T -N -w -c -C65001'

    print @command

EXEC xp_cmdshell @command 

回答1:

I found solution

 +@uploadFolder + CAST(@PurchaseOrderID AS varchar(20))+'.xml' +' -T -N -w -c -C65001'

I remove the -w from the code. then it works. But it works only for Microsoft SQL Server 2014 (SP2-CU6) (KB4019094) - 12.0.5553.0 (X64).

https://support.microsoft.com/en-us/help/3136780/utf-8-encoding-support-for-the-bcp-utility-and-bulk-insert-transact-sq



回答2:

Looking at your code I get this is [sql-server]...

SQL-Server's VARCHAR-type is 1-byte-encoded extended ASCII (depending on the associated COLLATION) and not UTF-8...

Whenever SQL-Server is talking about unicode it is the native NVARCHAR or NCHAR type, which is a 2-byte-encoded string (UCS-2 actually).

Furthermore you are stating some switches

  • -N: Use the native types in any case, use unicode for characters.
  • -w: Explicit switch to use unicode
  • -C: with page 65001 (utf-8), while the documentation explicitly tells you, that this codepage is not supported!.

If the rest is okay, your code should place a file on disk which is 2-byte-unicode, but is yelling: "I'm UTF-8!". Try the same with UTF-16 in the XML's declaration. Alternatively you can use any other tool to convert SQL-Server's output to the encoding of your choice.

I'd tend to C#: Read the XML directly into a .NET-string (always unicode!) from SQL-Server (no BCP at all) and write the .Net-string to a file using the proper encoding.

UPDATE

At the given link (BCP documentation for SQL Server 2014) there is still

** Important ** SQL Server does not support code page 65001 (UTF-8 encoding).

But for version 2016+ you can find this:

Versions prior to version 13 ( SQL Server 2016 (13.x)) do not support code page 65001 (UTF-8 encoding). Versions beginning with 13 can import UTF-8 encoding to earlier versions of SQL Server.

But 2014-SP2 adds this to v2014 actually (KB-Article 3136780). So the documentation is not covering this backward enhancement.

It should work to export UTF-8 using this combination without the -N -w:

-c -C65001