What makes a user-defined class unhashable?

2019-03-22 15:35发布

问题:

The docs say that a class is hashable as long as it defines __hash__ method and __eq__ method. However:

class X(list):
  # read-only interface of `tuple` and `list` should be the same, so reuse tuple.__hash__
  __hash__ = tuple.__hash__

x1 = X()
s = {x1} # TypeError: unhashable type: 'X'

What makes X unhashable?

Note that I must have identical lists (in terms of regular equality) to be hashed to the same value; otherwise, I will violate this requirement on hash functions:

The only required property is that objects which compare equal have the same hash value

The docs do warn that a hashable object shouldn't be modified during its lifetime, and of course I don't modify instances of X after creation. Of course, the interpreter won't check that anyway.

回答1:

Simply setting the __hash__ method to that of the tuple class is not enough. You haven't actually told it how to hash any differently. tuples are hashable because they are immutable. If you really wanted to make you specific example work, it might be like this:

class X2(list):
    def __hash__(self):
        return hash(tuple(self))

In this case you are actually defining how to hash your custom list subclass. You just have to define exactly how it can generate a hash. You can hash on whatever you want, as opposed to using the tuple's hashing method:

def __hash__(self):
    return hash("foobar"*len(self))


回答2:

From the Python3 docs:

If a class does not define an __eq__() method it should not define a __hash__() operation either; if it defines __eq__() but not __hash__(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an __eq__() method, it should not implement __hash__(), since the implementation of hashable collections requires that a key’s hash value is immutable (if the object’s hash value changes, it will be in the wrong hash bucket).

Ref: object.__hash__(self)

Sample code:

class Hashable:
    pass

class Unhashable:
    def __eq__(self, other):
        return (self == other)

class HashableAgain:
    def __eq__(self, other):
        return (self == other)

    def __hash__(self):
        return id(self)

def main():
    # OK
    print(hash(Hashable()))
    # Throws: TypeError("unhashable type: 'X'",)
    print(hash(Unhashable()))  
    # OK
    print(hash(HashableAgain()))


回答3:

What you could and should do, based on your other question, is: don't subclass anything, just encapsulate a tuple. It's perfectly fine to do so in the init.

class X(object):
    def __init__(self, *args):
        self.tpl = args
    def __hash__(self):
        return hash(self.tpl)
    def __eq__(self, other):
        return self.tpl == other
    def __repr__(self):
        return repr(self.tpl)

x1 = X()
s = {x1}

which yields:

>>> s
set([()])
>>> x1
()


回答4:

If you don't modify instances of X after creation, why aren't you subclassing tuple?

But I'll point out that this actually doesn't throw an error, at least in Python 2.6.

>>> class X(list):
...     __hash__ = tuple.__hash__
...     __eq__ = tuple.__eq__
... 
>>> x = X()
>>> s = set((x,))
>>> s
set([[]])

I hesitate to say "works" because this doesn't do what you think it does.

>>> a = X()
>>> b = X((5,))
>>> hash(a)
4299954584
>>> hash(b)
4299954672
>>> id(a)
4299954584
>>> id(b)
4299954672

It's just using the object id as a hash. When you actually call __hash__ you still get an error; likewise for __eq__.

>>> a.__hash__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor '__hash__' for 'tuple' objects doesn't apply to 'X' object
>>> X().__eq__(X())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor '__eq__' for 'tuple' objects doesn't apply to 'X' object

I gather that the python internals, for some reason, are detecting that X has a __hash__ and an __eq__ method, but aren't calling them.

The moral of all this is: just write a real hash function. Since this is a sequence object, converting it to a tuple and hashing that is the most obvious approach.

def __hash__(self):
    return hash(tuple(self))