T-SQL: How can I compare two variables of type XML

2019-04-09 02:32发布

问题:

Using only SQL Server 2008 R2 (this is going to be in a stored proc), how can I determine if two variables of type XML are equivalent?

Here is what I want to do:

DECLARE @XmlA   XML
DECLARE @XmlB   XML

SET @XmlA = '[Really long Xml value]'
SET @XmlB = '[Really long Xml value]'

IF @XmlA = @XmlB
    SELECT 'Matching Xml!'

But as you probably know, it returns:

Msg 305, Level 16, State 1, Line 7 The XML data type cannot be compared or sorted, except when using the IS NULL operator.

I can convert to VarChar(MAX) and compare, but that only compares the first 2MB. Is there another way?

回答1:

I stumbled upon this fairly comprehensive article which goes into more detail of actually comparing the CONTENT of 2 XML entries to determine whether they are the same. It makes sense, as the ordering of attributes in nodes CAN differ, even though their values are exactly the same. I'd recommend you read through it and even implement the function to see if it works for you... I tried it out quickly and it seemed to work for me?



回答2:

Check this SQL function:

CREATE FUNCTION [dbo].[CompareXml]
(
    @xml1 XML,
    @xml2 XML
)
RETURNS INT
AS 
BEGIN
    DECLARE @ret INT
    SELECT @ret = 0


    -- -------------------------------------------------------------
    -- If one of the arguments is NULL then we assume that they are
    -- not equal. 
    -- -------------------------------------------------------------
    IF @xml1 IS NULL OR @xml2 IS NULL 
    BEGIN
        RETURN 1
    END

    -- -------------------------------------------------------------
    -- Match the name of the elements 
    -- -------------------------------------------------------------
    IF  (SELECT @xml1.value('(local-name((/*)[1]))','VARCHAR(MAX)')) 
        <> 
        (SELECT @xml2.value('(local-name((/*)[1]))','VARCHAR(MAX)'))
    BEGIN
        RETURN 1
    END

     ---------------------------------------------------------------
     --Match the value of the elements
     ---------------------------------------------------------------
    IF((@xml1.query('count(/*)').value('.','INT') = 1) AND (@xml2.query('count(/*)').value('.','INT') = 1))
    BEGIN
    DECLARE @elValue1 VARCHAR(MAX), @elValue2 VARCHAR(MAX)

    SELECT
        @elValue1 = @xml1.value('((/*)[1])','VARCHAR(MAX)'),
        @elValue2 = @xml2.value('((/*)[1])','VARCHAR(MAX)')

    IF  @elValue1 <> @elValue2
    BEGIN
        RETURN 1
    END
    END

    -- -------------------------------------------------------------
    -- Match the number of attributes 
    -- -------------------------------------------------------------
    DECLARE @attCnt1 INT, @attCnt2 INT
    SELECT
        @attCnt1 = @xml1.query('count(/*/@*)').value('.','INT'),
        @attCnt2 = @xml2.query('count(/*/@*)').value('.','INT')

    IF  @attCnt1 <> @attCnt2 BEGIN
        RETURN 1
    END


    -- -------------------------------------------------------------
    -- Match the attributes of attributes 
    -- Here we need to run a loop over each attribute in the 
    -- first XML element and see if the same attribut exists
    -- in the second element. If the attribute exists, we
    -- need to check if the value is the same.
    -- -------------------------------------------------------------
    DECLARE @cnt INT, @cnt2 INT
    DECLARE @attName VARCHAR(MAX)
    DECLARE @attValue VARCHAR(MAX)

    SELECT @cnt = 1

    WHILE @cnt <= @attCnt1 
    BEGIN
        SELECT @attName = NULL, @attValue = NULL
        SELECT
            @attName = @xml1.value(
                'local-name((/*/@*[sql:variable("@cnt")])[1])', 
                'varchar(MAX)'),
            @attValue = @xml1.value(
                '(/*/@*[sql:variable("@cnt")])[1]', 
                'varchar(MAX)')

        -- check if the attribute exists in the other XML document
        IF @xml2.exist(
                '(/*/@*[local-name()=sql:variable("@attName")])[1]'
            ) = 0
        BEGIN
            RETURN 1
        END

        IF  @xml2.value(
                '(/*/@*[local-name()=sql:variable("@attName")])[1]', 
                'varchar(MAX)')
            <>
            @attValue
        BEGIN
            RETURN 1
        END

        SELECT @cnt = @cnt + 1
    END

    -- -------------------------------------------------------------
    -- Match the number of child elements 
    -- -------------------------------------------------------------
    DECLARE @elCnt1 INT, @elCnt2 INT
    SELECT
        @elCnt1 = @xml1.query('count(/*/*)').value('.','INT'),
        @elCnt2 = @xml2.query('count(/*/*)').value('.','INT')


    IF  @elCnt1 <> @elCnt2
    BEGIN
        RETURN 1
    END


    -- -------------------------------------------------------------
    -- Start recursion for each child element
    -- -------------------------------------------------------------
    SELECT @cnt = 1
    SELECT @cnt2 = 1
    DECLARE @x1 XML, @x2 XML
    DECLARE @noMatch INT

    WHILE @cnt <= @elCnt1 
    BEGIN

        SELECT @x1 = @xml1.query('/*/*[sql:variable("@cnt")]')
    --RETURN CONVERT(VARCHAR(MAX),@x1)
    WHILE @cnt2 <= @elCnt2
    BEGIN
        SELECT @x2 = @xml2.query('/*/*[sql:variable("@cnt2")]')
        SELECT @noMatch = dbo.CompareXml( @x1, @x2 )
        IF @noMatch = 0 BREAK
        SELECT @cnt2 = @cnt2 + 1
    END

    SELECT @cnt2 = 1

        IF @noMatch = 1
        BEGIN
            RETURN 1
        END

        SELECT @cnt = @cnt + 1
    END

    RETURN @ret
END

Here is the Source


The function fails to compare XML fragments e.g. when there is not a single root element, like:

SELECT dbo.CompareXml('<data/>', '<data/><data234/>') 

In order to fix this, you must wrap your XMLs in root elements, when they are passed to the function or edit the function to do this. For, example:

SELECT dbo.CompareXml('<r><data/></r>', '<r><data/><data234/></r>')  


回答3:

There are many different ways of comparing two XML documents, and a lot depends on what kind of differences you want to tolerate: you definitely need to tolerate differences in encoding, attribute order, insignificant whitespace, numeric character references, and use of attribute delimiters, and you should probably also tolerate differences in use of comments, namespace prefixes, and CDATA. So comparing two XML documents as strings is definitely not a good idea - unless you invoke XML canonicalization first.

For many purposes the XQuery deep-equals() function does the right thing (and is more-or-less equivalent to comparing the canonical forms of the two XML documents). I don't know enough about Microsoft's SQL Server implementation of XQuery to tell you how to invoke this from the SQL level.



回答4:

You may cast fields to varbinary(max), hash them and compare hashes. But you definitely miss if XMLs are equivalent but not identical

To calculate hash you may use either CLR function:

using System;
using System.Data.SqlTypes;
using System.IO;

namespace ClrHelpers
{
    public partial class UserDefinedFunctions {
        [Microsoft.SqlServer.Server.SqlFunction]
        public static Guid HashMD5(SqlBytes data) {
            System.Security.Cryptography.MD5CryptoServiceProvider md5 = new System.Security.Cryptography.MD5CryptoServiceProvider();
            md5.Initialize();
            int len = 0;
            byte[] b = new byte[8192];
            Stream s = data.Stream;
            do {
                len = s.Read(b, 0, 8192);
                md5.TransformBlock(b, 0, len, b, 0);
            } while(len > 0);
            md5.TransformFinalBlock(b, 0, 0);
            Guid g = new Guid(md5.Hash);
            return g;
        }
    };
}

Or sql function:

CREATE FUNCTION dbo.GetMyLongHash(@data VARBINARY(MAX))
RETURNS VARBINARY(MAX)
WITH RETURNS NULL ON NULL INPUT
AS
BEGIN
    DECLARE @res VARBINARY(MAX) = 0x
    DECLARE @position INT = 1, @len INT = DATALENGTH(@data)

    WHILE 1 = 1
    BEGIN
        SET @res = @res + HASHBYTES('MD5', SUBSTRING(@data, @position, 8000))
        SET @position = @position+8000
        IF @Position > @len 
          BREAK
    END
    WHILE DATALENGTH(@res) > 16 SET @res= dbo.GetMyLongHash(@res)
    RETURN @res
END


回答5:

If you can use SQL CLR, I suggest to write a function using XNode.DeepEquals Method:

var xmlTree1 = new XElement("Root",
    new XAttribute("Att1", 1),
    new XAttribute("Att2", 2),
    new XElement("Child1", 1),
    new XElement("Child2", "some content")
);
var xmlTree2 = new XElement("Root",
    new XAttribute("Att1", 1),
    new XAttribute("Att2", 2),
    new XElement("Child1", 1),
    new XElement("Child2", "some content")
);
Console.WriteLine(XNode.DeepEquals(xmlTree1, xmlTree2));

If you cannot, you can write your own function (see SQL FIDDLE EXAMPLE):

CREATE function [dbo].[udf_XML_Is_Equal]
(
    @Data1 xml,
    @Data2 xml
)
returns bit
as
begin
    declare
        @i bigint, @cnt1 bigint, @cnt2 bigint,
        @Sub_Data1 xml, @Sub_Data2 xml,
        @Name varchar(max), @Value1 nvarchar(max), @Value2 nvarchar(max)

    if @Data1 is null or @Data2 is null
        return 1

    --=========================================================================================================
    -- If more than one root - recurse for each element
    --=========================================================================================================
    select
        @cnt1 = @Data1.query('count(/*)').value('.','int'),
        @cnt2 = @Data1.query('count(/*)').value('.','int')

    if @cnt1 <> @cnt2
        return 0        

    if @cnt1 > 1
    begin
        select @i = 1
        while @i <= @cnt1
        begin
            select
                @Sub_Data1 = @Data1.query('/*[sql:variable("@i")]'),
                @Sub_Data2 = @Data2.query('/*[sql:variable("@i")]')

            if dbo.udf_XML_Is_Equal_New(@Sub_Data1, @Sub_Data2) = 0
                return 0

            select @i = @i + 1
        end

        return 1
    end

    --=========================================================================================================
    -- Comparing root data
    --=========================================================================================================
    if @Data1.value('local-name(/*[1])','nvarchar(max)') <> @Data2.value('local-name(/*[1])','nvarchar(max)') 
        return 0

    if @Data1.value('/*[1]', 'nvarchar(max)') <> @Data2.value('/*[1]', 'nvarchar(max)')
        return 0

    --=========================================================================================================
    -- Comparing attributes
    --=========================================================================================================
    select
        @cnt1 = @Data1.query('count(/*[1]/@*)').value('.','int'),
        @cnt2 = @Data1.query('count(/*[1]/@*)').value('.','int')

    if @cnt1 <> @cnt2
        return 0

    if exists (
        select *
        from
        (
            select
                T.C.value('local-name(.)', 'nvarchar(max)') as Name,
                T.C.value('.', 'nvarchar(max)') as Value
            from @Data1.nodes('/*[1]/@*') as T(C)
        ) as D1
        full outer join
        (
            select
                T.C.value('local-name(.)', 'nvarchar(max)') as Name,
                T.C.value('.', 'nvarchar(max)') as Value
            from @Data2.nodes('/*[1]/@*') as T(C)
        ) as D2
        on D1.Name = D2.Name
        where
            not
            (
                D1.Value is null and D2.Value is null or
                D1.Value is not null and D2.Value is not null and D1.Value = D2.Value
            )
    )
        return 0


    --=========================================================================================================
    -- Recursively running for each child
    --=========================================================================================================
    select
        @cnt1 = @Data1.query('count(/*[1]/*)').value('.','int'),
        @cnt2 = @Data2.query('count(/*[1]/*)').value('.','int')

    if @cnt1 <> @cnt2
        return 0    

    select @i = 1
    while @i <= @cnt1        
    begin
        select
            @Sub_Data1 = @Data1.query('/*/*[sql:variable("@i")]'),
            @Sub_Data2 = @Data2.query('/*/*[sql:variable("@i")]')

        if dbo.udf_XML_Is_Equal(@Sub_Data1, @Sub_Data2) = 0
            return 0

        select @i = @i + 1
    end

    return 1
END