What is the relationship between the Python datamodel and builtin functions?
- The builtins and operators use the underlying datamodel methods or attributes.
- The builtins and operators have more elegant behavior and are in general more forward compatible.
- The special methods of the datamodel are semantically non-public interfaces.
- The builtins and language operators are specifically intended to be the user interface for behavior implemented by special methods.
Thus, you should prefer to use the builtin functions and operators where possible over the special methods and attributes of the datamodel.
The semantically internal APIs are more likely to change than the public interfaces. While Python doesn't actually consider anything "private" and exposes the internals, that doesn't mean it's a good idea to abuse that access. Doing so has the following risks:
- You may find you have more breaking changes when upgrading your Python executable or switching to other implementations of Python (like PyPy, IronPython, or Jython, or some other unforeseen implementation.)
- Your colleagues will likely think poorly of your language skills and conscientiousness, and consider it a code-smell, bringing you and the rest of your code to greater scrutiny.
- The builtin functions are easy to intercept behavior for. Using special methods directly limits the power of your Python for introspection and debugging.
In depth
The builtin functions and operators invoke the special methods and use the special attributes in the Python datamodel. They are the readable and maintainable veneer that hides the internals of objects. In general, users should use the builtins and operators given in the language as opposed to calling the special methods or using the special attributes directly.
The builtin functions and operators also can have fallback or more elegant behavior than the more primitive datamodel special methods. For example:
next(obj, default)
allows you to provide a default instead of raising StopIteration
when an iterator runs out, while obj.__next__()
does not.
str(obj)
fallsback to obj.__repr__()
when obj.__str__()
isn't available - whereas calling obj.__str__()
directly would raise an attribute error.
obj != other
fallsback to not obj == other
in Python 3 when no __ne__
- calling obj.__ne__(other)
would not take advantage of this.
(Builtin functions can also be easily overshadowed, if necessary or desirable, on a module's global scope or the builtins
module, to further customize behavior.)
Mapping the builtins and operators to the datamodel
Here is a mapping, with notes, of the builtin functions and operators to the respective special methods and attributes that they use or return - note that the usual rule is that the builtin function usually maps to a special method of the same name, but this is not consistent enough to warrant giving this map below:
builtins/ special methods/
operators -> datamodel NOTES (fb == fallback)
repr(obj) obj.__repr__() provides fb behavior for str
str(obj) obj.__str__() fb to __repr__ if no __str__
bytes(obj) obj.__bytes__() Python 3 only
unicode(obj) obj.__unicode__() Python 2 only
format(obj) obj.__format__() format spec optional.
hash(obj) obj.__hash__()
bool(obj) obj.__bool__() Python 3, fb to __len__
bool(obj) obj.__nonzero__() Python 2, fb to __len__
dir(obj) obj.__dir__()
vars(obj) obj.__dict__ does not include __slots__
type(obj) obj.__class__ type actually bypasses __class__ -
overriding __class__ will not affect type
help(obj) obj.__doc__ help uses more than just __doc__
len(obj) obj.__len__() provides fb behavior for bool
iter(obj) obj.__iter__() fb to __getitem__ w/ indexes from 0 on
next(obj) obj.__next__() Python 3
next(obj) obj.next() Python 2
reversed(obj) obj.__reversed__() fb to __len__ and __getitem__
other in obj obj.__contains__(other) fb to __iter__ then __getitem__
obj == other obj.__eq__(other)
obj != other obj.__ne__(other) fb to not obj.__eq__(other) in Python 3
obj < other obj.__lt__(other) get >, >=, <= with @functools.total_ordering
complex(obj) obj.__complex__()
int(obj) obj.__int__()
float(obj) obj.__float__()
round(obj) obj.__round__()
abs(obj) obj.__abs__()
The operator
module has length_hint
which has a fallback implemented by a respective special method if __len__
is not implemented:
length_hint(obj) obj.__length_hint__()
Dotted Lookups
Dotted lookups are contextual. Without special method implementation, first look in class hierarchy for data descriptors (like properties and slots), then in the instance __dict__
(for instance variables), then in the class hierarchy for non-data descriptors (like methods). Special methods implement the following behaviors:
obj.attr obj.__getattr__('attr') provides fb if dotted lookup fails
obj.attr obj.__getattribute__('attr') preempts dotted lookup
obj.attr = _ obj.__setattr__('attr', _) preempts dotted lookup
del obj.attr obj.__delattr__('attr') preempts dotted lookup
Descriptors
Descriptors are a bit advanced - feel free to skip these entries and come back later - recall the descriptor instance is in the class hierarchy (like methods, slots, and properties). A data descriptor implements either __set__
or __delete__
:
obj.attr descriptor.__get__(obj, type(obj))
obj.attr = val descriptor.__set__(obj, val)
del obj.attr descriptor.__delete__(obj)
When the class is instantiated (defined) the following descriptor method __set_name__
is called if any descriptor has it to inform the descriptor of its attribute name. (This is new in Python 3.6.) cls
is same as type(obj)
above, and 'attr'
stands in for the attribute name:
class cls:
@descriptor_type
def attr(self): pass # -> descriptor.__set_name__(cls, 'attr')
Items (subscript notation)
The subscript notation is also contextual:
obj[name] -> obj.__getitem__(name)
obj[name] = item -> obj.__setitem__(name, item)
del obj[name] -> obj.__delitem__(name)
A special case for subclasses of dict
, __missing__
is called if __getitem__
doesn't find the key:
obj[name] -> obj.__missing__(name)
Operators
There are also special methods for +, -, *, @, /, //, %, divmod(), pow(), **, <<, >>, &, ^, |
operators, for example:
obj + other -> obj.__add__(other), fallback to other.__radd__(obj)
obj | other -> obj.__or__(other), fallback to other.__ror__(obj)
and in-place operators for augmented assignment, +=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=
, for example:
obj += other -> obj.__iadd__(other)
obj |= other -> obj.__ior__(other)
and unary operations:
+obj -> obj.__pos__()
-obj -> obj.__neg__()
~obj -> obj.__invert__()
Context Managers
A context manager defines __enter__
, which is called on entering the code block (its return value, usually self, is aliased with as
), and __exit__
, which is guaranteed to be called on leaving the code block, with exception information.
with obj as cm: -> cm = obj.__enter__()
raise Exception('message')
-> obj.__exit__(Exception, Exception('message'), traceback_object)
If __exit__
gets an exception and then returns a false value, it will reraise it on leaving the method.
If no exception, __exit__
gets None
for those three arguments instead, and the return value is meaningless:
with obj: -> obj.__enter__()
pass
-> obj.__exit__(None, None, None)
Some Metaclass Special Methods
Similarly, classes can have special methods (from their metaclasses) that support abstract base classes:
isinstance(obj, cls) -> cls.__instancecheck__(obj)
issubclass(sub, cls) -> cls.__subclasscheck__(sub)
An important takeaway is that while the builtins like next
and bool
do not change between Python 2 and 3, underlying implementation names are changing.
Thus using the builtins also offers more forward compatibility.
When am I supposed to use the special names?
In Python, names that begin with underscores are semantically non-public names for users. The underscore is the creator's way of saying, "hands-off, don't touch."
This is not just cultural, but it is also in Python's treatment of API's. When a package's __init__.py
uses import *
to provide an API from a subpackage, if the subpackage does not provide an __all__
, it excludes names that start with underscores. The subpackage's __name__
would also be excluded.
IDE autocompletion tools are mixed in their consideration of names that start with underscores to be non-public. However, I greatly appreciate not seeing __init__
, __new__
, __repr__
, __str__
, __eq__
, etc. (nor any of the user created non-public interfaces) when I type the name of an object and a period.
Thus I assert:
The special "dunder" methods are not a part of the public interface. Avoid using them directly.
So when to use them?
The main use-case is when implementing your own custom object or subclass of a builtin object.
Try to only use them when absolutely necessary. Here are some examples:
Use the __name__
special attribute on functions or classes
When we decorate a function, we typically get a wrapper function in return that hides helpful information about the function. We would use the @wraps(fn)
decorator to make sure we don't lose that information, but if we need the name of the function, we need to use the __name__
attribute directly:
from functools import wraps
def decorate(fn):
@wraps(fn)
def decorated(*args, **kwargs):
print('calling fn,', fn.__name__) # exception to the rule
return fn(*args, **kwargs)
return decorated
Similarly, I do the following when I need the name of the object's class in a method (used in, for example, a __repr__
):
def get_class_name(self):
return type(self).__name__
# ^ # ^- must use __name__, no builtin e.g. name()
# use type, not .__class__
Using special attributes to write custom classes or subclassed builtins
When we want to define custom behavior, we must use the data-model names.
This makes sense, since we are the implementors, these attributes aren't private to us.
class Foo(object):
# required to here to implement == for instances:
def __eq__(self, other):
# but we still use == for the values:
return self.value == other.value
# required to here to implement != for instances:
def __ne__(self, other): # docs recommend for Python 2.
# use the higher level of abstraction here:
return not self == other
However, even in this case, we don't use self.value.__eq__(other.value)
or not self.__eq__(other)
(see my answer here for proof that the latter can lead to unexpected behavior.) Instead, we should use the higher level of abstraction.
Another point at which we'd need to use the special method names is when we are in a child's implementation, and want to delegate to the parent. For example:
class NoisyFoo(Foo):
def __eq__(self, other):
print('checking for equality')
# required here to call the parent's method
return super(NoisyFoo, self).__eq__(other)
Conclusion
The special methods allow users to implement the interface for object internals.
Use the builtin functions and operators wherever you can. Only use the special methods where there is no documented public API.
I'll show some usage that you apparently didn't think of, comment on the examples you showed, and argue against the privacy claim from your own answer.
I agree with your own answer that for example len(a)
should be used, not a.__len__()
. I'd put it like this: len
exists so we can use it, and __len__
exists so len
can use it. Or however that really works internally, since len(a)
can actually be much faster, at least for example for lists and strings:
>>> timeit('len(a)', 'a = [1,2,3]', number=10**8)
4.22549770486512
>>> timeit('a.__len__()', 'a = [1,2,3]', number=10**8)
7.957335462257106
>>> timeit('len(s)', 's = "abc"', number=10**8)
4.1480574509332655
>>> timeit('s.__len__()', 's = "abc"', number=10**8)
8.01780160432645
But besides defining these methods in my own classes for usage by builtin functions and operators, I occasionally also use them as follows:
Let's say I need to give a filter function to some function and I want to use a set s
as the filter. I'm not going to create an extra function lambda x: x in s
or def f(x): return x in s
. No. I already have a perfectly fine function that I can use: the set's __contains__
method. It's simpler and more direct. And even faster, as shown here (ignore that I save it as f
here, that's just for this timing demo):
>>> timeit('f(2); f(4)', 's = {1, 2, 3}; f = s.__contains__', number=10**8)
6.473739433621368
>>> timeit('f(2); f(4)', 's = {1, 2, 3}; f = lambda x: x in s', number=10**8)
19.940786514456924
>>> timeit('f(2); f(4)', 's = {1, 2, 3}\ndef f(x): return x in s', number=10**8)
20.445680107760325
So while I don't directly call magic methods like s.__contains__(x)
, I do occasionally pass them somewhere like some_function_needing_a_filter(s.__contains__)
. And I think that's perfectly fine, and better than the lambda/def alternative.
My thoughts on the examples you showed:
- Example 1: Asked how to get the size of a list, he answered
items.__len__()
. Even without any reasoning. My verdict: That's just wrong. Should be len(items)
.
- Example 2: Does mention
d[key] = value
first! And then adds d.__setitem__(key, value)
with the reasoning "if your keyboard is missing the square bracket keys", which rarely applies and which I doubt was serious. I think it was just the foot in the door for the last point, mentioning that that's how we can support the square bracket syntax in our own classes. Which turns it back to a suggestion to use square brackets.
- Example 3: Suggests
obj.__dict__
. Bad, like the __len__
example. But I suspect he just didn't know vars(obj)
, and I can understand it, as vars
is less common/known and the name does differ from the "dict" in __dict__
.
- Example 4: Suggests
__class__
. Should be type(obj)
. I suspect it's similar to the __dict__
story, although I think type
is more well-known.
About privacy: In your own answer you say these methods are "semantically private". I strongly disagree. Single and double leading underscores are for that, but not the data model's special "dunder/magic" methods with double leading+trailing underscores.
- The two things you use as arguments are importing behaviour and IDE's autocompletion. But importing and these special methods are different areas, and the one IDE I tried (the popular PyCharm) disagrees with you. I created a class/object with methods
_foo
and __bar__
and then autocompletion didn't offer _foo
but did offer __bar__
. And when I used both methods anyway, PyCharm only warned me about _foo
(calling it a "protected member"), not about __bar__
.
- PEP 8 says 'weak "internal use" indicator' explicitly for single leading underscore, and explicitly for double leading underscores it mentions the name mangling and later explains that it's for "attributes that you do not want subclasses to use". But the comment about double leading+trailing underscores doesn't say anything like that.
- The data model page you yourself link to says that these special method names are "Python’s approach to operator overloading". Nothing about privacy there. The words private/privacy/protected don't even appear anywhere on that page.
I also recommend reading this article by Andrew Montalenti about these methods, emphasizing that "The dunder convention is a namespace reserved for the core Python team" and "Never, ever, invent your own dunders" because "The core Python team reserved a somewhat ugly namespace for themselves". Which all matches PEP 8's instruction "Never invent [dunder/magic] names; only use them as documented". I think Andrew is spot on - it's just an ugly namespace of the core team. And it's for the purpose of operator overloading, not about privacy (not Andrew's point but mine and the data model page's).
Besides Andrew's article I also checked several more about these "magic"/"dunder" methods, and I found none of them talking about privacy at all. That's just not what this is about.
Again, we should use len(a)
, not a.__len__()
. But not because of privacy.