On several places I've read that crc32 is additive and so: CRC(A xor B) = CRC(A) xor CRC(B).
The above statement was disproven by the following code I wrote:
import zlib
def crc32(data):
return zlib.crc32(data) & 0xffffffff
print crc32(chr(ord("A") ^ ord("B")))
print crc32("A") ^ crc32("B")
Program output:
1259060791
2567524794
Could someone provide a proper code proving this theory or point me where I've failed?
CRC is additive in the mathematical sense since the CRC hash is just a remainder value from a carryless division of all the data (treated as a giant integer) divided by the polynomial constant. Using your example, it's akin to this sort of thing:
7 mod 5 = 2
6 mod 5 = 1
(7 mod 5) + (6 mod 5) = 3
(7 + 6) mod 5 = 3
In that analogy, '5' is our CRC polynomial.
Here's an example to play with (gcc based):
#include <stdio.h>
#include <x86intrin.h>
int main(void)
{
unsigned int crc_a = __builtin_ia32_crc32si( 0, 5);
printf( "crc(5) = %08X\n", crc_a );
unsigned int crc_b = __builtin_ia32_crc32si( 0, 7);
printf( "crc(7) = %08X\n", crc_b );
unsigned int crc_xor = crc_a ^ crc_b;
printf( "crc(5) XOR crc(7) = %08X\n", crc_xor );
unsigned int crc_xor2 = __builtin_ia32_crc32si( 0, 5 ^ 7);
printf( "crc(5 XOR 7) = %08X\n", crc_xor2 );
return 0;
}
The output is as expected:
plxc15034> gcc -mcrc32 -Wall -O3 crctest.c
plxc15034> ./a.out
crc(5) = A6679B4B
crc(7) = 1900B8CA
crc(5) XOR crc(7) = BF672381
crc(5 XOR 7) = BF672381
Because this code uses the x86 CRC32 instruction, it will only run on an Intel i7 or newer. The intrinsic function takes the running CRC hash as the first parameter and the new data to accumulate as the second parameter. The return value is the new running CRC.
The initial running CRC value of 0 in the code above is critical. Using any other initial value, then CRC is not "additive" in the practical sense because you have effectively thrown away information about the integer you are dividing into. And this is exactly what's happening in your example. CRC functions never initialize that initial running CRC value to zero, but usually -1. The reason is that an initial CRC of 0 allows any number of leading 0's in the data to simply fall through without changing the running CRC value, which remains 0. So, initializing the CRC to 0 is mathematically sound, but for practical purposes of calculating hash, it's the last thing you'd want.
The CRC-32 algorithm is based on polynomial division, with some extra steps added. Pure polynomial remainder is additive.
By that, I mean: mod(poly1 + poly2, poly3) = mod(mod(poly1, poly3) + mod(poly2, poly3), poly3)
The CRC-32 algorithm builds on this, and is non-additive. To compute the CRC-32 of a byte array m:
- XOR the first 4 bytes by 0xFFFFFFFF.
- Treat earlier bytes as higher polynomial powers and treat lower order bits as higher polynomial powers. For example, the bytes 0x01 0x04 would be the polynomial x^15 + x^3.
- Multiply the polynomial by x^32.
- Take the remainder of this polynomial divided by the CRC-32 polynomial, 0x104C11DB7. The remainder polynomial has degree < 32.
- Treat lower powers as higher order bits. For example, the polynomial x^2 would be the 32-bit integer 0x40000000.
- XOR the result by 0xFFFFFFFF.
The pure polynomial remainder operation is in step #4. It's steps #1 and #6 that make the CRC-32 algorithm non-additive. So if you undo the effect of steps #1 and #6, then you can modify the CRC-32 algorithm to be additive.
(See also: Python CRC-32 woes)
If a, b, and c are the same length, CRC(a) xor CRC(b) xor CRC(c) equals CRC(a xor b xor c). Returning to your original formulation, it means that CRC(a xor b) equals CRC(a) xor CRC(b) xor CRC(z), where z is a sequence of zeroes the same length as the other two sequences.
This would imply that each bit position of the CRC result is only driven by the equivalent bit position in the input. Consider your example with B == 0.
The relationship you're describing is more likely to be true for some primitive xor or additive checksum algorithms.