Safe-casting text to XML

2019-02-22 22:00发布

问题:

I have over a million rows in an SQLServer2005 database, with a text column that contains XML strings. I want to cast the text to the XML datatype in order to extract parts of the data.

The problem is there are some records that will throw errors when casting (ie. invalid XML). How can I ignore these errors so that all the valid XML is casted correctly and invalid XML is stored as null?

回答1:

Once in a similar situation I added the XML column to the same table as the Text column. Then I used a RBAR process to attempt to copy the "XML" from the text column to the new XML column (not the fastest but commits single writes and this will be a one time thing, right?). This is assuming your table has a PK of an int data type.

declare @minid int, @maxid int;

select @minid=min(ID), @maxid=max(ID) from XMLTable;

while @minid <= @maxid
begin

begin try

update t
set XMLColumn = cast(TextColumn as XML)
from XMLTable t
where ID = @minid;

set @minid = @minid+1

end try
begin catch

print('XML transform failed on record ID:'+cast(@minid as varchar))

--advance to the next record
set @minid = @minid+1
end catch


end


回答2:

I know this is SQL Server 2012+ functionality but since this question is the top Google result here it is:

SELECT 
COALESCE(TRY_CONVERT(xml, '</bad xml>'), 'InvalidXML')

You can find the documentation here: TRY_CONVERT (Transact-SQL)



回答3:

another possibility shall be writing a .net assembly which loads the xml into XMLdocument, returns a BOOL if the xml is valid so that you can actually parse it in sql