I am trying to convert data in a varchar column to XML but I was getting errors with certain characters. Running this ...
-- This fails
DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(@Data AS XML) AS DataXml
... results in the following error
Msg 9420, Level 16, State 1, Line 3
XML parsing: line 1, character 55, illegal xml character
It appears that it's the broken pipe character that is causing the error but I thought that it was a valid character for UTF-8. Looking at the XML spec it appears to be valid.
When I change it to this ...
-- This works
DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(REPLACE(CAST(@Data AS NVARCHAR(MAX)), 'encoding="utf-8"', '') AS XML) AS DataXml
... it works without error (replacing encoding string to utf-16 also works). I'm using SQL Server 2008 R2 with SQL_Latin1_General_CP1_CI_AS Coallation.
Can anyone tell my why I need to convert to NVARCHAR
and strip the encoding="utf-8"
for this to work?
Thanks,
Edit
It appears that this also works ...
DECLARE @Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(REPLACE(@Data, 'encoding="utf-8"', '') AS XML) AS DataXml
Removing the utf-8 encoding from the prolog is sufficient for SQL Server to do the conversion.
Your pipe character is using Unicode codepoint
U+00A6 BROKEN BAR
instead ofU+007C VERTICAL LINE
.U+00A6
is outside of ASCII.VARCHAR
does not support non-ASCII characters. That is why you have to useNVARCHAR
instead, which is designed to handle Unicode data.