Python Portable Hash

I am reading the source code of pyspark here. I got really confused by the portable function here.

def portable_hash(x):
    """
    This function returns consistant hash code for builtin types, especially
    for None and tuple with None.
    The algrithm is similar to that one used by CPython 2.7
    >>> portable_hash(None)
    0
    >>> portable_hash((None, 1)) & 0xffffffff
    219750521
    """
    if x is None:
        return 0
    if isinstance(x, tuple):
        h = 0x345678
        for i in x:
            h ^= portable_hash(i)
            h *= 1000003
            h &= sys.maxint
        h ^= len(x)
        if h == -1:
            h = -2
        return h
    return hash(x)

I can see that it is a recursive function. If the input is a tuple, then recursively loop through every element.

Here are few of my questions:

is this hash approach an one to one mapping?
this function only takes None and tuple and hashable values into consideration, I know that list object is not hashable, did they do that intentionally or not.
I don't have much experience with hash, is this a very classic hashing approach, if so, is there any resource for me to get a better understanding for it?

标签： python hash

1条回答

狗以群分

2楼-- · 2019-08-04 16:46

"is this hash approach an one to one mapping?"

NO hashing approach is 1-to-1: they all map M possible inputs into N possible integer results where M is much larger than N.

"this function only takes None and tuple and hashable values into consideration, I know that list object is not hashable, did they do that intentionally or not."

Yes, this function delegates to built-in hash everything else except tuples and None. And it's definitely a deliberate design decision in Python (also respected by this function) to make mutable built-ins such as list and dict not hashable.

"I don't have much experience with hash, is this a very classic hashing approach, if so, is there any resource for me to get a better understanding for it?"

Yes, exclusive-or'ing hash of items, with modification of the running total as one goes through, is indeed a very classic approach to hashing containers of hashable items.

For more study about hashing, I'd start at http://en.wikipedia.org/wiki/Hash_function .

0人赞添加讨论(0) 举报

Python Portable Hash

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间