Expected number of hash collisions

2019-01-19 00:05发布

I feel like I'm way overthinking this problem, but here goes anyway...

I have a hash table with M slots in its internal array. I need to insert N elements into the hash table. Assuming that I have a hash function that randomly inserts am element into a slot with equal probability for each slot, what's the expected value of the total number of hash collisions?

(Sorry that this is more of a math question than a programming question).

Edit: Here's some code I have to simulate it using Python. I'm getting numerical answers, but having trouble generalizing it to a formula and explaining it.

import random
import pdb

N = 5
M = 8

NUM_ITER = 100000

def get_collisions(table):
    col = 0
    for item in table:
        if item > 1:
            col += (item-1)
    return col

def run():
    table = [0 for x in range(M)]

    for i in range(N):
        table[int(random.random() * M)] += 1

    #print table
    return get_collisions(table)

# Main

total = 0
for i in range(NUM_ITER):
    total += run()

print float(total)/NUM_ITER

标签： python hash collision

2条回答

戒情不戒烟

2楼-- · 2019-01-19 00:37

The formula for the SUM(x*(x+1)/2) metric can be found here, and the expected value appears to be (n/2m)* (n+2m -1).

Don't know about the variance, IANAM.

0人赞添加讨论(0) 举报

三岁会撩人

3楼-- · 2019-01-19 00:43

You'll find the answer here: Quora.com. The expected number of collisions for m buckets and n inserts is

n - m * (1 - ((m-1)/m)^n).

0人赞添加讨论(0) 举报

Expected number of hash collisions

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间