I don't really get how CHECKSUM_AGG()
works, although I understood that it is somehow built by using XORs. Which explains why it returns 0
when you pass just equal integers.
But why do I get the same aggregated checksum in the following SQL, where the input values are unique?
DECLARE @test1 TABLE (chksum INT)
INSERT INTO @test1 VALUES (2147473855), (2147473343)
SELECT CHECKSUM_AGG(chksum)
FROM @test1
DECLARE @test2 TABLE (chksum INT)
INSERT INTO @test2 VALUES (2147474831), (2147472271)
SELECT CHECKSUM_AGG(chksum)
FROM @test2
An explanation would be much appreciated. Thanks!
There are known issues with SQL Server CHECKSUM and CHECKSUM_AGG implementations: CHECKSUM weakness explained
Use HASHBYTES instead: Using HASHBYTES to compare columns
From Microsoft:
If one of the values in the expression list changes, the checksum of the list also generally changes. However, there is a small chance that the checksum will not change. For this reason, we do not recommend using CHECKSUM to detect whether values have changed, unless your application can tolerate occasionally missing a change. Consider using HashBytes instead. When an MD5 hash algorithm is specified, the probability of HashBytes returning the same result for two different inputs is much lower than that of CHECKSUM.
You may not use HASHBYTES across rows directly - there is one workaround here.
Here is comparison for smaller numbers, using HASBYTES workaround:
DECLARE @test1 TABLE (chksum INT)
DECLARE @test2 TABLE (chksum INT)
INSERT INTO @test1 VALUES (50), (3), (26)
INSERT INTO @test2 VALUES (45), (0), (6)
SELECT [Values] = '50, 3, 26',
[Checksum] = CHECKSUM_AGG(chksum),
-- HashBytes is limited to 8000 bytes only
[Hashbytes] = HashBytes('md5',convert(varbinary(max),(SELECT * FROM @test1 FOR XML AUTO)))
FROM @test1
UNION ALL
SELECT [Values] = '45, 0, 6',
[Checksum] = CHECKSUM_AGG(chksum),
-- HashBytes is limited to 8000 bytes only
[Hashbytes] = HashBytes('md5',convert(varbinary(max),(SELECT * FROM @test2 FOR XML AUTO)))
FROM @test2