Generalised Birthday Calculation Given Hash Length

2019-07-21 15:28发布

Let us assume that we are given the following:

  • The length of the hash
  • The chance of obtaining a collision

Now, knowing the above, how can we obtain the number of "samples" needed to obtain the given chance percentage?

2条回答
Rolldiameter
2楼-- · 2019-07-21 15:42

here is a little javascript function to calculate the chance, based on the "Simplified Approximations" algorithm from https://preshing.com/20110504/hash-collision-probabilities/ (thanks for the link @Frank ) to calculate the chance of collision, and using the Decimal.js bignum library to manage bigger numbers than Javascript's Number can handle, example:

samples=2**64; //
hash_size_bytes=20; // 160 bit hash
number_of_possible_hashes=Decimal("2").pow(8*hash_size_bytes);
console.log(collision_chance(samples,number_of_possible_hashes));
// ~ 0.00000001 % chance of a collision with 2**64 samples and 20-byte-long hashes.
samples=77163;
hash_size_bytes=4; // 32bit hash
number_of_possible_hashes=Decimal("2").pow(8*hash_size_bytes);
console.log(collision_chance(samples,number_of_possible_hashes));
// ~ 49.999% chance of a collision for a 4-byte hash with 77163 samples.

function:

// with https://github.com/MikeMcl/decimal.js/blob/master/decimal.min.js
function collision_chance(samples,number_of_possible_hashes){
    var Decimal100 = Decimal.clone({ precision: 100, rounding: 8 });
    var k=Decimal100(samples);
    var N=Decimal100(number_of_possible_hashes);
    var MinusK=Decimal100(samples);
    MinusK.s=-1;
    var ret=((MinusK.mul(k.sub(1))).div(N.mul(2))).exp();
    ret=ret.mul(100);
    ret=Decimal100(100).sub(ret);
    return ret.toFixed(100);
}
查看更多
太酷不给撩
3楼-- · 2019-07-21 15:49

When we take the Simplified formula for the birthday paradox we get:

probability = k^2/2N

So:

sqr(probability*2*n) = k

Where we know that n = 2^lenghtHash

A small test: Hash = 16 bit : N= 65536 probability = 50% = 0.5

sqr(0.5*2*65536) = 256 samples

This is not 100% correct as we started of with the Simplified formula, but for big hashes and lager sample sets it gets very close.

for a link on the formula you can look here.

查看更多
登录 后发表回答