class A(object):
def __cmp__(self):
print '__cmp__'
return object.__cmp__(self)
def __eq__(self, rhs):
print '__eq__'
return True
a1 = A()
a2 = A()
print a1 in set([a1])
print a1 in set([a2])
Why does first line prints True, but second prints False? And neither enters operator eq?
I am using Python 2.6
You need to define __hash__
too. For example
class A(object):
def __hash__(self):
print '__hash__'
return 42
def __cmp__(self, other):
print '__cmp__'
return object.__cmp__(self, other)
def __eq__(self, rhs):
print '__eq__'
return True
a1 = A()
a2 = A()
print a1 in set([a1])
print a1 in set([a2])
Will work as expected.
As a general rule, any time you implement __cmp__
you should implement a __hash__
such that for all x
and y
such that x == y
, x.__hash__() == y.__hash__()
.
Set __contains__ makes checks in the following order:
'Match' if hash(a) == hash(b) and (a is b or a==b) else 'No Match'
The relevant C source code is in Objects/setobject.c::set_lookkey() and in Objects/object.c::PyObject_RichCompareBool().
Sets and dictionaries gain their speed by using hashing as a fast approximation of full equality checking. If you want to redefine equality, you usually need to redefine the hash algorithm so that it is consistent.
The default hash function uses the identity of the object, which is pretty useless as a fast approximation of full equality, but at least allows you to use an arbitrary class instance as a dictionary key and retrieve the value stored with it if you pass exactly the same object as a key. But it means if you redefine equality and don't redefine the hash function, your objects will go into a dictionary/set without complaining about not being hashable, but still won't actually work the way you expect them to.
See the official python docs on __hash__
for more details.
A tangential answer, but your question and my testing made me curious. If you ignore the set operator which is the source of your __hash__
problem, it turns out your question is still interesting.
Thanks to the help I got on this SO question, I was able to chase the in operator through the source code to it's root. Near the bottom I found the PyObject_RichCompareBool function which indeed tests for identity (see the comment about "Quick result") before testing for equality.
So unless I misunderstand the way things work, the technical answer to your question is first identity and then equality, through the equality test itself. Just to reiterate, that is not the source of the behavior you were seeing but just the technical answer to your question.
If I misunderstood the source, somebody please set me straight.
int
PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)
{
PyObject *res;
int ok;
/* Quick result when objects are the same.
Guarantees that identity implies equality. */
if (v == w) {
if (op == Py_EQ)
return 1;
else if (op == Py_NE)
return 0;
}
res = PyObject_RichCompare(v, w, op);
if (res == NULL)
return -1;
if (PyBool_Check(res))
ok = (res == Py_True);
else
ok = PyObject_IsTrue(res);
Py_DECREF(res);
return ok;
}
Sets seem to use hash codes, then identity, before comparing for equality. The following code:
class A(object):
def __eq__(self, rhs):
print '__eq__'
return True
def __hash__(self):
print '__hash__'
return 1
a1 = A()
a2 = A()
print 'set1'
set1 = set([a1])
print 'set2'
set2 = set([a2])
print 'a1 in set1'
print a1 in set1
print 'a1 in set2'
print a1 in set2
outputs:
set1
__hash__
set2
__hash__
a1 in set1
__hash__
True
a1 in set2
__hash__
__eq__
True
What happens seems to be:
- The hash code is computed when an element is inserted into a hash. (To compare with the existing elements.)
- The hash code for the object you're checking with the
in
operator is computed.
- Elements of the set with the same hash code are inspected by first checking whether they're the same object as the one you're looking for, or if they're logically equal to it.