Why does CHECKSUM_AGG() return same values for tot

2019-06-22 04:39发布

问题:

I don't really get how CHECKSUM_AGG() works, although I understood that it is somehow built by using XORs. Which explains why it returns 0 when you pass just equal integers.

But why do I get the same aggregated checksum in the following SQL, where the input values are unique?

DECLARE @test1 TABLE (chksum INT)
INSERT INTO @test1 VALUES (2147473855), (2147473343)
SELECT CHECKSUM_AGG(chksum)
FROM @test1

DECLARE @test2 TABLE (chksum INT)
INSERT INTO @test2 VALUES (2147474831), (2147472271)
SELECT CHECKSUM_AGG(chksum)
FROM @test2

An explanation would be much appreciated. Thanks!

回答1:

There are known issues with SQL Server CHECKSUM and CHECKSUM_AGG implementations: CHECKSUM weakness explained

Use HASHBYTES instead: Using HASHBYTES to compare columns

From Microsoft: If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change. For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change. Consider using HashBytes instead. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM.

You may not use HASHBYTES across rows directly - there is one workaround here.

Here is comparison for smaller numbers, using HASBYTES workaround:

DECLARE @test1 TABLE (chksum INT) 
DECLARE @test2 TABLE (chksum INT)

INSERT INTO @test1 VALUES (50), (3), (26)
INSERT INTO @test2 VALUES (45), (0), (6)

SELECT [Values]    = '50, 3, 26', 
       [Checksum]  = CHECKSUM_AGG(chksum),
       -- HashBytes is limited to 8000 bytes only
       [Hashbytes] = HashBytes('md5',convert(varbinary(max),(SELECT * FROM @test1 FOR XML AUTO)))
FROM @test1

UNION ALL
SELECT  [Values]    = '45, 0, 6',      
        [Checksum]  = CHECKSUM_AGG(chksum),
        -- HashBytes is limited to 8000 bytes only
        [Hashbytes] = HashBytes('md5',convert(varbinary(max),(SELECT * FROM @test2 FOR XML AUTO)))
FROM @test2



标签: tsql