Python method accessor creates new objects on each

2019-03-26 02:00发布

问题:

When investigating for another question, I found the following:

>>> class A:
...   def m(self): return 42
... 
>>> a = A()

This was expected:

>>> A.m == A.m
True
>>> a.m == a.m
True

But this I did not expect:

>>> a.m is a.m
False

And especially not this:

>>> A.m is A.m
False

Python seems to create new objects for each method access. Why am I seeing this behavior? I.e. what is the reason why it can't reuse one object per class and one per instance?

回答1:

Yes, Python creates new method objects for each access, because it builds a wrapper object to pass in self. This is called a bound method.

Python uses descriptors to do this; function objects have a __get__ method that is called when accessed on a class:

>>> A.__dict__['m'].__get__(A(), A)
<bound method A.m of <__main__.A object at 0x10c29bc10>>
>>> A().m
<bound method A.m of <__main__.A object at 0x10c3af450>>

Note that Python cannot reuse A().m; Python is a highly dynamic language and the very act of accessing .m could trigger more code, which could alter behaviour of what A().m would return next time when accessed.

The @classmethod and @staticmethod decorators make use of this mechanism to return a method object bound to the class instead, and a plain unbound function, respectively:

>>> class Foo:
...     @classmethod
...     def bar(cls): pass
...     @staticmethod
...     def baz(): pass
... 
>>> Foo.__dict__['bar'].__get__(Foo(), Foo)
<bound method type.bar of <class '__main__.Foo'>>
>>> Foo.__dict__['baz'].__get__(Foo(), Foo)
<function Foo.baz at 0x10c2a1f80>
>>> Foo().bar
<bound method type.bar of <class '__main__.Foo'>>
>>> Foo().baz
<function Foo.baz at 0x10c2a1f80>

See the Python descriptor howto for more detail.

However, Python 3.7 adds a new LOAD_METHOD - CALL_METHOD opcode pair that replaces the current LOAD_ATTRIBUTE - CALL_FUNCTION opcode pair precisely to avoid creating a new method object each time. This optimisation transforms the executon path for instance.foo() from type(instance).__dict__['foo'].__get__(instance, type(instance))() with type(instance).__dict__['foo'](instance), so 'manually' passing in the instance directly to the function object. The optimisation falls back to the normal attribute access path (including binding descriptors) if the attribute found is not a pure-python function object.



回答2:

Because that's the most convenient, least magical and most space efficient way of implementing bound methods.

In case you're not aware, bound methods refers to being able to do something like this:

f = obj.m
# ... in another place, at another time
f(args, but, not, self)

Functions are descriptors. Descriptors are general objects which can behave differently when accessed as attribute of a class or object. They are used to implement property, classmethod, staticmethod, and several other things. The specific operation of function descriptors is that they return themselves for class access, and return a fresh bound method object for instance access. (Actually, this is only true for Python 3; Python 2 is more complicated in this regard, it has "unbound methods" which are basically functions but not quite).

The reason a new object is created on each access is one of simplicity and efficency: Creating a bound method up-front for every method of every instance takes time and space. Creating them on demand and never freeing them is a potential memory leak (although CPython does something similar for other built-in types) and slightly slower in some cases. Complicated weakref-based caching schemes method objects aren't free either and significantly more complicated (historically, bound methods predate weakrefs by far).