Msg 9402, Level 16, State 1, Line 9 XML parsing: l

2019-07-09 20:48发布

A piece of tsql code doesnt behave the same from production to Test environment. When the code below is executed on prod it brings back data

SELECT [col1xml]

FROM [DBName].[dbo].[Table1] (NOLOCK)

WHERE (cast([col1xml] as xml).value('(/Payment/****/trn1)[1]','nvarchar(20)') ='123456'))

However that same code brings back the below error when ran in Test.

Msg 9402, Level 16, State 1, Line 9 XML parsing: line 1, character 38, unable to switch the encoding

I have seen the fix provided by this site of conversion of UTF and this works in both prod and test. See below. However i need to provide an answer to the developers of why this behavior is occurring and a rationale why they should change their code(if that is the case)

WHERE CAST(

REPLACE(CAST(col1xml AS VARCHAR(MAX)), 'encoding="utf-16"', 'encoding="utf-8"')

AS XML).value('(/Payment/****/trn1)[1]','NVARCHAR(max)') ='123456')

I have compared both DB's and looked for anything obvious such as ansi nulls and ansi padding. Everything is the same and the version of SQL Server. This is SQL SERVER 2012 11.0.5388 version. Data between environments is different but the table schema is identical and the data type for col1xml is ntext.

1条回答
一纸荒年 Trace。
2楼-- · 2019-07-09 21:15

In SQL Server you should store XML in a column typed XML. This native type has a got a lot of advantages. It is much faster and has implicit validity checks.

From your question I take, that you store your XML in NTEXT. This type is deprecated for centuries and will not be supported in future versions! You ought to change this soon!

SQL-Server knows two kinds of strings:

  • 1 byte strings (CHAR or VARCHAR), which is extended ASCII
    Important: This is not UTF-8! Native UTF-8 support will be part of a coming version.
  • 2 byte string (NCHAR or NVARCHAR), which is UTF-16 (UCS-2)

If the XML has a leading declaration with an encoding (in most cases this is utf-8 or utf-16) you can get into troubles.

If the XML is stored as 2-byte-string (at least the NTEXT tells me this), the declaration has to be utf-16. With a 1-byte-string it should be utf-8.

The best (and easiest) was to ommit the declaration completely. You do not need it. Storing the XML in the appropriate type will kill this declaration automatically.

What you should do: Create a new column of type XML and shuffle all your XMLs to this column. Get rid of any TEXT, NTEXT and IMAGE columns you might have!

The next step is: Be happy and enjoy the fast and easy going with the native XML type :-D

UPDATE Differences in environment

You write: Data between environments is different

The error happens here:

cast([col1xml] as xml)

If your column would store the XML in the native type, you would not need a cast (which is very expensive!!) at all. But in your case this cast depends on the actual XML. As this is stored in NTEXT it is 2-byte-string. If your XML starts with a declaration stating a non-supported encoding (in most cases utf-8), this will fail.

Try this:

This works

DECLARE @xml2Byte_UTF16 NVARCHAR(100)='<?xml version="1.0" encoding="utf-16"?><root>test1</root>';
SELECT CAST(@xml2Byte_UTF16 AS XML);

DECLARE @xml1Byte_UTF8 VARCHAR(100)='<?xml version="1.0" encoding="utf-8"?><root>test2</root>';
SELECT CAST(@xml1Byte_UTF8 AS XML);

This fails

DECLARE @xml2Byte_UTF8 NVARCHAR(100)='<?xml version="1.0" encoding="utf-8"?><root>test3</root>';
SELECT CAST(@xml2Byte_UTF8 AS XML);

DECLARE @xml1Byte_UTF16 VARCHAR(100)='<?xml version="1.0" encoding="utf-16"?><root>test4</root>';
SELECT CAST(@xml1Byte_UTF16 AS XML);

Play around with VARCHAR and NVARCHAR and utf-8 and utf-16...

查看更多
登录 后发表回答