Output XML Files with encoding UTF-8 using SQL Ser

2019-02-28 07:09发布

I have a query that generates XML files and loads them to FTP with <?xml version="1.0"?>.

I need to switch encoding to UTF-8 as follows:

<?xml version="1.0" encoding="utf-8"?> 

I can do it manually in the text editor. But cannot do it in SQL Server.

I also read this article but that did not contribute in finding solution for the issue.

https://docs.microsoft.com/en-us/sql/relational-databases/xml/create-instances-of-xml-data

enter image description here

enter image description here

My code:

USE [Audit_DBA]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

-- to call the procedure use the code below but assign your own path: 
-- EXEC [Audit_DBA].[dbo].[StateAndLocalAward_ToXML] 'C:\NLP\Test\NewFolder'

CREATE PROCEDURE [dbo].[StateAndLocalAward_ToXML_VC]
    @FileDestinationDir VARCHAR(2000)
AS 
    DECLARE  @FilePath VARCHAR(4000)

    DECLARE @SQLStr VARCHAR(8000),
            @Cmd VARCHAR(1000),
            @Ret INT

    IF EXISTS (SELECT * FROM InputTemp.SYS.TABLES WHERE NAME = 'StateAndLocalAward_output') 
       DROP TABLE InputTemp.dbo.StateAndLocalAward_output;

    ;WITH CTE_StateAndLocalAward_output AS 
    (
        SELECT TOP 200
            sc.stateContractId,
            ca.OnviaId AS [Reference], --AS OnviaID,
            ca.contractTitle AS [Title], 
            nigp.NIGPCodeTx AS [NIPGCode],
            nigp.NIGPDescriptionLongTx AS [NIPGDescription],
            ca.Description,
            vm.parentName AS [VendorName],
            offering.offeringTx AS [PrimaryOffering],
            ca.StartDate,
            ca.AwardDt
        FROM 
            ISCore.dbo.StateContract sc WITH (NOLOCK)
        INNER JOIN 
            ISCore.dbo.ContractAward ca WITH (NOLOCK) ON sc.contractAwardId = ca.contractAwardID
        INNER JOIN 
            ISLibrary.dbo.VendorMaster vm WITH (NOLOCK) ON ca.vendorId = vm.vendorID
        LEFT OUTER JOIN 
            ISCore.dbo.CompanyProfile cp WITH (NOLOCK) ON ca.vendorId = cp.vendorId
        LEFT OUTER JOIN 
            ISCore.dbo.Offering offering WITH (NOLOCK) ON ca.offeringID = offering.offeringID
        LEFT OUTER JOIN 
            ISCore.dbo.NIGPSrvc nigp WITH (NOLOCK) ON ca.NIGPCode = nigp.NIGPCodeTx
        WHERE 
            vm.showUnverified = 1 AND sc.stateContractId = -- 464482 stateContractId
            AND StartDate >= DATEADD(month, -12, GETDATE())
    )
    SELECT 
        *
    INTO 
        InputTemp.dbo.StateAndLocalAward_output 
    FROM 
        CTE_StateAndLocalAward_output;

    --select * from InputTemp.dbo.StateAndLocalAward_output

    DECLARE @StateContractId INT;
    --DECLARE @xml XML;

    DECLARE Cur_StateAndLocalAward_StateContractId CURSOR FOR
        SELECT StateContractId
        FROM inputtemp.dbo.StateAndLocalAward_output t1 WITH (NOLOCK)

    OPEN Cur_StateAndLocalAward_StateContractId;

    FETCH NEXT FROM Cur_StateAndLocalAward_StateContractId INTO @StateContractId -- @xml;

    WHILE @@FETCH_STATUS = 0 
    BEGIN
        SELECT 
            @SQLStr = 'SELECT Body.[Reference], Body.[Title], Body.[NIPGCode], Body.[NIPGDescription], Body.[Description], Body.[VendorName], Body.[PrimaryOffering] FROM InputTemp.dbo.StateAndLocalAward_output AS Body where StateContractId = ''' + str(@StateContractId) + ''' FOR XML AUTO, ELEMENTS;'

        --SELECT N'<?xml version="1.0" encoding="UTF-8"?>' 
        SELECT
            CAST((SELECT N'<?xml version="1.0" encoding="UTF-8"?>' + (@SQLSTr)) as varchar(8000) /*as XML*/) as SQLStr 

        -- select CAST((SELECT N'<?xml version="1.0" encoding="UTF-8"?>' + 
        --  (@SQLSTr)) as XML) as SQLStr 

If I run this code instead of the same above but cast as XML type, I get an error:

Msg 9402, Level 16, State 1, Procedure StateAndLocalAward_ToXML, Line 86 [Batch Start Line 10]
XML parsing: line 1, character 38, unable to switch the encoding

select @FilePath = @FileDestinationDir+'\NewFolder'+ltrim(rtrim(str(@StateContractId)))+'.xml' 
select @Cmd = ' bcp " ' + @SQLStr + '" queryout '+@FilePath+' -w -r "" -T -S ' +@@ServerName 
exec  @Ret = master.dbo.xp_cmdshell @Cmd 
FETCH NEXT FROM Cur_StateAndLocalAward_StateContractId INTO @StateContractId -- @xml;
END
CLOSE Cur_StateAndLocalAward_StateContractId;
DEALLOCATE Cur_StateAndLocalAward_StateContractId;
GO

2条回答
2楼-- · 2019-02-28 07:33
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO


CREATE PROCEDURE [dbo].[MyXMLTest]
@FileDestinationDir VARCHAR(2000)

-- to call procedure specify your own file path 
-- EXEC [Audit_DBA].[dbo].[MyXMLTest] 'E:\NLP\GovwinIQ_Ontology\NewFolder'

AS 

SET QUOTED_IDENTIFIER ON

IF OBJECT_ID (N'InputTemp.dbo.XMLTest', N'U') IS NOT NULL
DROP TABLE InputTemp.dbo.XMLTest;

CREATE TABLE InputTemp.dbo.XMLTest

(
[Id] INT NOT NULL,
[FirstName] VARCHAR(100) NOT NULL,
[LastName] VARCHAR(100) NOT NULL,
[Address] VARCHAR(100) NOT NULL
);

INSERT INTO InputTemp.dbo.XMLTest ([Id], [FirstName], [LastName], [Address])
VALUES (12, 'Zhuk', 'Termik', '123 Gam Str, Boston, NY');

--SELECT * FROM InputTemp.dbo.XMLTest

DECLARE @FilePath VARCHAR(4000)

DECLARE @SQLStr NVARCHAR(4000),
        @Cmd NVARCHAR(4000),
        @Ret INT

DECLARE @Id INT;

SELECT @Id = 12;

SELECT @SQLStr = 
'SELECT N''<?xml version=''''1.0'''' encoding=''''UTF-8''''?>'' + (SELECT CAST((SELECT [Id], [FirstName], [LastName], [Address] FROM InputTemp.dbo.XMLTest AS Body WHERE Id = '''  + str(@Id) + ''' FOR XML AUTO, ELEMENTS) AS NVARCHAR(MAX)))'

SELECT @SQLStr AS SQLStr

SELECT @FilePath = @FileDestinationDir+'\NewFolder'+ltrim(rtrim(str(@Id)))+'.xml' 

SELECT @Cmd = ' bcp " ' + @SQLStr + '" queryout '+@FilePath+' -c  -C65001 -r "" -T -S ' +@@ServerName 

EXEC @Ret = master.dbo.xp_cmdshell @Cmd 

IF OBJECT_ID (N'InputTemp.dbo.XMLTest', N'U') IS NOT NULL
DROP TABLE InputTemp.dbo.XMLTest;

GO
查看更多
唯我独甜
3楼-- · 2019-02-28 07:42

There are some things to know:

  • SQL Server does not support export via BCP to UTF-8 before version 2016 (and 2014 with SP2).
  • One cannot add the xml-declaration (<?xml blah ?>) to a native SQL-Server XML typed variable or column. This will either fail ("...switch the encoding") or the xml-declaration will disappear.
  • You can add the xml-declaration on string level to an xml casted to NVARCHAR(MAX). But you cannot re-cast (re-convert) this to an XML without failing or losing the declaration.
  • Internally SQL-Server keeps any XML as UCS-2 (very close to UTF-16) in any case.
  • SQL-Servers VARCHAR (CHAR) type is not utf-8 but extended ASCII (depending on a COLLATION)
  • on string level you can write literally anything into the xml-declaration (as you can creat something, which looks like XML, but is not well-formed. This is just an unchecked string.
  • The encoding stated in the xml-declaration is important only to mark the actual file encoding when written to a disk or when handled as byte stream.
  • You can write encoding="x" and store the file with an encoding of y - but you shouldn't.
  • SQL-Server will cast a string with an utf-8 declaration to XML when it is VARCHAR and it will cast a string with utf-16 when it is NVARCHAR, but you cannot cross this (Read this related answer). Other encodings very likely lead to cannot switch the encoding error.

About your code

  • You should change @SQLStr and @cmd to NVARCHAR(MAX), othewise you might get in troubles with non-plain-latin characters.
  • As you are using a CURSOR, you should fill an XML-typed variable with the result of your statement, cast this to NVARCHAR(MAX) and add the declaration to this string. Do not cast the result back to XML.
  • Read the BCP docs. Stating -w will write unicode (wide), which is not utf-8 (what you write into the declaration has no effect here).

Hint:

Read this related answer, showing utf-8 export with BCP on SQL-Server 2016

查看更多
登录 后发表回答