How to calculate the odds of a collision in hash a

2019-03-14 12:15发布

Say I have a hash algorithm, and it's nice and smooth (The odds of any one hash value coming up are the same as any other value).

Now say that I know that the odds of picking 2 hashes and there being a collision are (For arguments sake) 50000:1.

Now say I pick 100 hashes. How do I calculate the odds of a collision within that set of 100 values, given the odds of a collision in a set of 2?

What is the general solution to this, so that I can come up with a number of hash attempts after which the odds fall below some acceptable threshold? E.g. I can say things like "A batch of 49999 hash value creations has a high chance of collision".

5条回答
Anthone
2楼-- · 2019-03-14 12:55

First calculate the probability that there is not a collision:

hashes_picked = 100
single_collision_odds = 50000

# safe_combinations is number of ways to pick hashes that don't overlap
safe_combinations = factorial(single_collision_odds) / factorial(single_collision_odds - hashes_picked)

# all_combinations is total number of ways to pick hashes
all_combinations = single_collision_odds ** hashes_picked   

collision_chance = (all_combinations - safe_combinations) / all_combinations
查看更多
相关推荐>>
3楼-- · 2019-03-14 12:55

That sounds a lot like the Birthday Paradox to me.

You should be able to just substitute the set of possible birthdays (365) with the possible hashes (50000) and run the same calculations they present there.

If you modify the python script presented in the article for your values:

 def bp(n, d):
    v = 1.0
    for i in range(n):

         v = v * (1 - float(i)/d)
    return 1 - v

 print bp(2, 50000)

You end up with the odds of collision on two numbers of 0.00002. Around 265 samples, you have around a 50% chance of having a collision.

查看更多
叼着烟拽天下
4楼-- · 2019-03-14 13:00

and in JS

function calculate(n,k)
{

    var result =1;
    for (var i=0; i<k; i++){
        result=result*n/(n-i)
    }
    result=(1-1/result)*100;
    return result;
}
查看更多
聊天终结者
5楼-- · 2019-03-14 13:05

This is called the Birthday problem. To solve it, think instead about the odds of there being no collisions (call it pnc).

  • pnc(1) = 1
  • pnc(2) = 1 - pc(2)
  • pnc(3) = pnc(2) * pnc(2) * pnc(2)
查看更多
登录 后发表回答