I need hash over pretty large files which is stored on distributed FS. I'm able to process parts of file with much more better performance than whole file so I'd like to be able to calculate hash over parts and then sum it.
I'm thinking about CRC64
as hashing algorithm but I have no clue how to use its theoretical 'linear function' property so I can sum CRC over parts of file. Any recommendation? Anything I missed here?
Additional notes why I'm looking at CRC64
:
- I can control file blocks but because of application nature they need to have different size (up to 1 byte, no any fixed blocks are possible).
- I know about
CRC32
implementation (zlib
) which includes way to sum CRC over parts but I'd like something more wider. 8 bytes look nice for me. - I know CRC is pretty fast. I'd like to get profit from this as file can be really huge (up to few Gb).
Decided that this was generally useful enough to write and make available:
OK, my contribution to this. Ported to Java.
So here is code: