Optimise updating MD5/SHA1 with streams of zeros

2019-05-05 09:55发布

问题:

Is it possible to optimise the function:

MD5_Update(&ctx_d, buf, num);

if you know that buf contains only zeros?

Or is this mathematically impossible?

Likewise for SHA1.

回答1:

If you control the input of the hash function then you could use a simple count instead of all the zero's, maybe using some kind of escape. E.g. 000020 in hex could mean 32 zero's. A (very) basic compression function may be much faster than MD5 or SHA1.

Obviously this solution will only be faster if you save one or more blocks of hash calculations. E.g. it does not matter if you hash 3 bytes or 16 bytes, as the input will be padded and expanded by the hash function before it is used.



回答2:

Cryptographic hashes are actually supposed to produce significant changes in output for small changes in input, see http://en.wikipedia.org/wiki/Avalanche_effect . It sounds like you're looking for some relationship between some hashed data, and some hashed data pre-padded with zeros. By design this change in your input should produce output that isn't clearly related.

EDIT: To answer your question directly, by design "a small change in either the key or the plaintext should cause a drastic change in the ciphertext" which means it's meant to be mathematically difficult to do.



回答3:

You'd probably get some speedup, but it'd be relatively minor. The most important thing for high performance hashing is choosing an optimized implementation, and to use GPUs(or even FPGA/ASIC) to exploit parallelism if that's possible.

There is a known speedup for SHA-1 with fixed IV and messages that differ only a little. That speedup is around 21%. See New attack makes some password cracking faster - Ars Technica.

You might get a similar speedup when you have a completely fixed message but a variable IV. But it'd be a lot of work to implement this, especially as a non expert. Buying additional hardware is probably much cheaper than speeding up your code a few percent.


If the beginning of your message consists of multiple constant blocks, you can hash them once, and cache the intermediate state of the hashfunction. Might or might not be applicable to your situation.