可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
Using only SQL Server 2008 R2 (this is going to be in a stored proc), how can I determine if two variables of type XML are equivalent?
Here is what I want to do:
DECLARE @XmlA XML
DECLARE @XmlB XML
SET @XmlA = '[Really long Xml value]'
SET @XmlB = '[Really long Xml value]'
IF @XmlA = @XmlB
SELECT 'Matching Xml!'
But as you probably know, it returns:
Msg 305, Level 16, State 1, Line 7 The XML data type cannot be
compared or sorted, except when using the IS NULL operator.
I can convert to VarChar(MAX)
and compare, but that only compares the first 2MB. Is there another way?
回答1:
I stumbled upon this fairly comprehensive article which goes into more detail of actually comparing the CONTENT of 2 XML entries to determine whether they are the same. It makes sense, as the ordering of attributes in nodes CAN differ, even though their values are exactly the same. I'd recommend you read through it and even implement the function to see if it works for you... I tried it out quickly and it seemed to work for me?
回答2:
Check this SQL function:
CREATE FUNCTION [dbo].[CompareXml]
(
@xml1 XML,
@xml2 XML
)
RETURNS INT
AS
BEGIN
DECLARE @ret INT
SELECT @ret = 0
-- -------------------------------------------------------------
-- If one of the arguments is NULL then we assume that they are
-- not equal.
-- -------------------------------------------------------------
IF @xml1 IS NULL OR @xml2 IS NULL
BEGIN
RETURN 1
END
-- -------------------------------------------------------------
-- Match the name of the elements
-- -------------------------------------------------------------
IF (SELECT @xml1.value('(local-name((/*)[1]))','VARCHAR(MAX)'))
<>
(SELECT @xml2.value('(local-name((/*)[1]))','VARCHAR(MAX)'))
BEGIN
RETURN 1
END
---------------------------------------------------------------
--Match the value of the elements
---------------------------------------------------------------
IF((@xml1.query('count(/*)').value('.','INT') = 1) AND (@xml2.query('count(/*)').value('.','INT') = 1))
BEGIN
DECLARE @elValue1 VARCHAR(MAX), @elValue2 VARCHAR(MAX)
SELECT
@elValue1 = @xml1.value('((/*)[1])','VARCHAR(MAX)'),
@elValue2 = @xml2.value('((/*)[1])','VARCHAR(MAX)')
IF @elValue1 <> @elValue2
BEGIN
RETURN 1
END
END
-- -------------------------------------------------------------
-- Match the number of attributes
-- -------------------------------------------------------------
DECLARE @attCnt1 INT, @attCnt2 INT
SELECT
@attCnt1 = @xml1.query('count(/*/@*)').value('.','INT'),
@attCnt2 = @xml2.query('count(/*/@*)').value('.','INT')
IF @attCnt1 <> @attCnt2 BEGIN
RETURN 1
END
-- -------------------------------------------------------------
-- Match the attributes of attributes
-- Here we need to run a loop over each attribute in the
-- first XML element and see if the same attribut exists
-- in the second element. If the attribute exists, we
-- need to check if the value is the same.
-- -------------------------------------------------------------
DECLARE @cnt INT, @cnt2 INT
DECLARE @attName VARCHAR(MAX)
DECLARE @attValue VARCHAR(MAX)
SELECT @cnt = 1
WHILE @cnt <= @attCnt1
BEGIN
SELECT @attName = NULL, @attValue = NULL
SELECT
@attName = @xml1.value(
'local-name((/*/@*[sql:variable("@cnt")])[1])',
'varchar(MAX)'),
@attValue = @xml1.value(
'(/*/@*[sql:variable("@cnt")])[1]',
'varchar(MAX)')
-- check if the attribute exists in the other XML document
IF @xml2.exist(
'(/*/@*[local-name()=sql:variable("@attName")])[1]'
) = 0
BEGIN
RETURN 1
END
IF @xml2.value(
'(/*/@*[local-name()=sql:variable("@attName")])[1]',
'varchar(MAX)')
<>
@attValue
BEGIN
RETURN 1
END
SELECT @cnt = @cnt + 1
END
-- -------------------------------------------------------------
-- Match the number of child elements
-- -------------------------------------------------------------
DECLARE @elCnt1 INT, @elCnt2 INT
SELECT
@elCnt1 = @xml1.query('count(/*/*)').value('.','INT'),
@elCnt2 = @xml2.query('count(/*/*)').value('.','INT')
IF @elCnt1 <> @elCnt2
BEGIN
RETURN 1
END
-- -------------------------------------------------------------
-- Start recursion for each child element
-- -------------------------------------------------------------
SELECT @cnt = 1
SELECT @cnt2 = 1
DECLARE @x1 XML, @x2 XML
DECLARE @noMatch INT
WHILE @cnt <= @elCnt1
BEGIN
SELECT @x1 = @xml1.query('/*/*[sql:variable("@cnt")]')
--RETURN CONVERT(VARCHAR(MAX),@x1)
WHILE @cnt2 <= @elCnt2
BEGIN
SELECT @x2 = @xml2.query('/*/*[sql:variable("@cnt2")]')
SELECT @noMatch = dbo.CompareXml( @x1, @x2 )
IF @noMatch = 0 BREAK
SELECT @cnt2 = @cnt2 + 1
END
SELECT @cnt2 = 1
IF @noMatch = 1
BEGIN
RETURN 1
END
SELECT @cnt = @cnt + 1
END
RETURN @ret
END
Here is the Source
The function fails to compare XML fragments e.g. when there is not a single root element, like:
SELECT dbo.CompareXml('<data/>', '<data/><data234/>')
In order to fix this, you must wrap your XMLs in root
elements, when they are passed to the function or edit the function to do this. For, example:
SELECT dbo.CompareXml('<r><data/></r>', '<r><data/><data234/></r>')
回答3:
There are many different ways of comparing two XML documents, and a lot depends on what kind of differences you want to tolerate: you definitely need to tolerate differences in encoding, attribute order, insignificant whitespace, numeric character references, and use of attribute delimiters, and you should probably also tolerate differences in use of comments, namespace prefixes, and CDATA. So comparing two XML documents as strings is definitely not a good idea - unless you invoke XML canonicalization first.
For many purposes the XQuery deep-equals() function does the right thing (and is more-or-less equivalent to comparing the canonical forms of the two XML documents). I don't know enough about Microsoft's SQL Server implementation of XQuery to tell you how to invoke this from the SQL level.
回答4:
You may cast fields to varbinary(max), hash them and compare hashes. But you definitely miss if XMLs are equivalent but not identical
To calculate hash you may use either CLR function:
using System;
using System.Data.SqlTypes;
using System.IO;
namespace ClrHelpers
{
public partial class UserDefinedFunctions {
[Microsoft.SqlServer.Server.SqlFunction]
public static Guid HashMD5(SqlBytes data) {
System.Security.Cryptography.MD5CryptoServiceProvider md5 = new System.Security.Cryptography.MD5CryptoServiceProvider();
md5.Initialize();
int len = 0;
byte[] b = new byte[8192];
Stream s = data.Stream;
do {
len = s.Read(b, 0, 8192);
md5.TransformBlock(b, 0, len, b, 0);
} while(len > 0);
md5.TransformFinalBlock(b, 0, 0);
Guid g = new Guid(md5.Hash);
return g;
}
};
}
Or sql function:
CREATE FUNCTION dbo.GetMyLongHash(@data VARBINARY(MAX))
RETURNS VARBINARY(MAX)
WITH RETURNS NULL ON NULL INPUT
AS
BEGIN
DECLARE @res VARBINARY(MAX) = 0x
DECLARE @position INT = 1, @len INT = DATALENGTH(@data)
WHILE 1 = 1
BEGIN
SET @res = @res + HASHBYTES('MD5', SUBSTRING(@data, @position, 8000))
SET @position = @position+8000
IF @Position > @len
BREAK
END
WHILE DATALENGTH(@res) > 16 SET @res= dbo.GetMyLongHash(@res)
RETURN @res
END
回答5:
If you can use SQL CLR, I suggest to write a function using XNode.DeepEquals Method:
var xmlTree1 = new XElement("Root",
new XAttribute("Att1", 1),
new XAttribute("Att2", 2),
new XElement("Child1", 1),
new XElement("Child2", "some content")
);
var xmlTree2 = new XElement("Root",
new XAttribute("Att1", 1),
new XAttribute("Att2", 2),
new XElement("Child1", 1),
new XElement("Child2", "some content")
);
Console.WriteLine(XNode.DeepEquals(xmlTree1, xmlTree2));
If you cannot, you can write your own function (see SQL FIDDLE EXAMPLE):
CREATE function [dbo].[udf_XML_Is_Equal]
(
@Data1 xml,
@Data2 xml
)
returns bit
as
begin
declare
@i bigint, @cnt1 bigint, @cnt2 bigint,
@Sub_Data1 xml, @Sub_Data2 xml,
@Name varchar(max), @Value1 nvarchar(max), @Value2 nvarchar(max)
if @Data1 is null or @Data2 is null
return 1
--=========================================================================================================
-- If more than one root - recurse for each element
--=========================================================================================================
select
@cnt1 = @Data1.query('count(/*)').value('.','int'),
@cnt2 = @Data1.query('count(/*)').value('.','int')
if @cnt1 <> @cnt2
return 0
if @cnt1 > 1
begin
select @i = 1
while @i <= @cnt1
begin
select
@Sub_Data1 = @Data1.query('/*[sql:variable("@i")]'),
@Sub_Data2 = @Data2.query('/*[sql:variable("@i")]')
if dbo.udf_XML_Is_Equal_New(@Sub_Data1, @Sub_Data2) = 0
return 0
select @i = @i + 1
end
return 1
end
--=========================================================================================================
-- Comparing root data
--=========================================================================================================
if @Data1.value('local-name(/*[1])','nvarchar(max)') <> @Data2.value('local-name(/*[1])','nvarchar(max)')
return 0
if @Data1.value('/*[1]', 'nvarchar(max)') <> @Data2.value('/*[1]', 'nvarchar(max)')
return 0
--=========================================================================================================
-- Comparing attributes
--=========================================================================================================
select
@cnt1 = @Data1.query('count(/*[1]/@*)').value('.','int'),
@cnt2 = @Data1.query('count(/*[1]/@*)').value('.','int')
if @cnt1 <> @cnt2
return 0
if exists (
select *
from
(
select
T.C.value('local-name(.)', 'nvarchar(max)') as Name,
T.C.value('.', 'nvarchar(max)') as Value
from @Data1.nodes('/*[1]/@*') as T(C)
) as D1
full outer join
(
select
T.C.value('local-name(.)', 'nvarchar(max)') as Name,
T.C.value('.', 'nvarchar(max)') as Value
from @Data2.nodes('/*[1]/@*') as T(C)
) as D2
on D1.Name = D2.Name
where
not
(
D1.Value is null and D2.Value is null or
D1.Value is not null and D2.Value is not null and D1.Value = D2.Value
)
)
return 0
--=========================================================================================================
-- Recursively running for each child
--=========================================================================================================
select
@cnt1 = @Data1.query('count(/*[1]/*)').value('.','int'),
@cnt2 = @Data2.query('count(/*[1]/*)').value('.','int')
if @cnt1 <> @cnt2
return 0
select @i = 1
while @i <= @cnt1
begin
select
@Sub_Data1 = @Data1.query('/*/*[sql:variable("@i")]'),
@Sub_Data2 = @Data2.query('/*/*[sql:variable("@i")]')
if dbo.udf_XML_Is_Equal(@Sub_Data1, @Sub_Data2) = 0
return 0
select @i = @i + 1
end
return 1
END