Why doesn't Python hash lists using ID?

2019-02-09 21:18发布

问题:

When using a dictionary in Python, the following is impossible:

d = {}
d[[1,2,3]] = 4

since 'list' is an unhashable type. However, the id function in Python returns an integer for an object that is guaranteed to be unique for the object's lifetime.

Why doesn't Python use id to hash a dictionary? Are there drawbacks?

回答1:

The reason is right here (Why must dictionary keys be immutable)

Some unacceptable solutions that have been proposed:

  • Hash lists by their address (object ID). This doesn’t work because if you construct a new list with the same value it won’t be found; e.g.:

    mydict = {[1, 2]: '12'}

    print mydict[[1, 2]]

would raise a KeyError exception because the id of the [1, 2] used in the second line differs from that in the first line. In other words, dictionary keys should be compared using ==, not using is.



回答2:

It is a requirement that if a == b, then hash(a) == hash(b). Using the id can break this, because the ID will not change if you mutate the list. Then you might have two lists that have equal contents, but have different hashes.

Another way to look at it is, yes, you could do it, but it would mean that you could not retrieve the dict value with another list with the same contents. You could only retrieve it by using the exact same list object as the key.



回答3:

In Python dictionaries keys are compared using ==, and the equality operator with lists does an item-by-item equality check so two different lists with the same elements compare equal and they must behave as the same key in a dictionary.

If you need to keep a dictionary or set of lists by identity instead of equality you can just wrap the list in a user-defined object or, depending on the context, may be you can use a dictionary where elements are stored/retrieve by using id explicitly.

Note however that keeping the id of an object stored doesn't imply the object will remain alive, that there is no way for going from id to object and that id may be reused over time for objects that have been garbage collected. A solution is to use

my_dict[id(x)] = [x, value]

instead of

my_dict[id(x)] = value