As I read Python answers on Stack Overflow, I continue to see some people telling users to use the data model's special methods or attributes directly.
I then see contradicting advice (sometimes from myself) saying not to do that, and instead to use builtin functions and the operators directly.
Why is that? What is the relationship between the special "dunder" methods and attributes of the Python data model and builtin functions?
When am I supposed to use the special names?
What is the relationship between the Python datamodel and builtin functions?
Thus, you should prefer to use the builtin functions and operators where possible over the special methods and attributes of the datamodel.
The semantically internal APIs are more likely to change than the public interfaces. While Python doesn't actually consider anything "private" and exposes the internals, that doesn't mean it's a good idea to abuse that access. Doing so has the following risks:
In depth
The builtin functions and operators invoke the special methods and use the special attributes in the Python datamodel. They are the readable and maintainable veneer that hides the internals of objects. In general, users should use the builtins and operators given in the language as opposed to calling the special methods or using the special attributes directly.
The builtin functions and operators also can have fallback or more elegant behavior than the more primitive datamodel special methods. For example:
next(obj, default)
allows you to provide a default instead of raisingStopIteration
when an iterator runs out, whileobj.__next__()
does not.str(obj)
fallsback toobj.__repr__()
whenobj.__str__()
isn't available - whereas callingobj.__str__()
directly would raise an attribute error.obj != other
fallsback tonot obj == other
in Python 3 when no__ne__
- callingobj.__ne__(other)
would not take advantage of this.(Builtin functions can also be easily overshadowed, if necessary or desirable, on a module's global scope or the
builtins
module, to further customize behavior.)Mapping the builtins and operators to the datamodel
Here is a mapping, with notes, of the builtin functions and operators to the respective special methods and attributes that they use or return - note that the usual rule is that the builtin function usually maps to a special method of the same name, but this is not consistent enough to warrant giving this map below:
The
operator
module haslength_hint
which has a fallback implemented by a respective special method if__len__
is not implemented:Dotted Lookups
Dotted lookups are contextual. Without special method implementation, first look in class hierarchy for data descriptors (like properties and slots), then in the instance
__dict__
(for instance variables), then in the class hierarchy for non-data descriptors (like methods). Special methods implement the following behaviors:Descriptors
Descriptors are a bit advanced - feel free to skip these entries and come back later - recall the descriptor instance is in the class hierarchy (like methods, slots, and properties). A data descriptor implements either
__set__
or__delete__
:When the class is instantiated (defined) the following descriptor method
__set_name__
is called if any descriptor has it to inform the descriptor of its attribute name. (This is new in Python 3.6.)cls
is same astype(obj)
above, and'attr'
stands in for the attribute name:Items (subscript notation)
The subscript notation is also contextual:
A special case for subclasses of
dict
,__missing__
is called if__getitem__
doesn't find the key:Operators
There are also special methods for
+, -, *, @, /, //, %, divmod(), pow(), **, <<, >>, &, ^, |
operators, for example:and in-place operators for augmented assignment,
+=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=
, for example:and unary operations:
Context Managers
A context manager defines
__enter__
, which is called on entering the code block (its return value, usually self, is aliased withas
), and__exit__
, which is guaranteed to be called on leaving the code block, with exception information.If
__exit__
gets an exception and then returns a false value, it will reraise it on leaving the method.If no exception,
__exit__
getsNone
for those three arguments instead, and the return value is meaningless:Some Metaclass Special Methods
Similarly, classes can have special methods (from their metaclasses) that support abstract base classes:
An important takeaway is that while the builtins like
next
andbool
do not change between Python 2 and 3, underlying implementation names are changing.Thus using the builtins also offers more forward compatibility.
When am I supposed to use the special names?
In Python, names that begin with underscores are semantically non-public names for users. The underscore is the creator's way of saying, "hands-off, don't touch."
This is not just cultural, but it is also in Python's treatment of API's. When a package's
__init__.py
usesimport *
to provide an API from a subpackage, if the subpackage does not provide an__all__
, it excludes names that start with underscores. The subpackage's__name__
would also be excluded.IDE autocompletion tools are mixed in their consideration of names that start with underscores to be non-public. However, I greatly appreciate not seeing
__init__
,__new__
,__repr__
,__str__
,__eq__
, etc. (nor any of the user created non-public interfaces) when I type the name of an object and a period.Thus I assert:
The special "dunder" methods are not a part of the public interface. Avoid using them directly.
So when to use them?
The main use-case is when implementing your own custom object or subclass of a builtin object.
Try to only use them when absolutely necessary. Here are some examples:
Use the
__name__
special attribute on functions or classesWhen we decorate a function, we typically get a wrapper function in return that hides helpful information about the function. We would use the
@wraps(fn)
decorator to make sure we don't lose that information, but if we need the name of the function, we need to use the__name__
attribute directly:Similarly, I do the following when I need the name of the object's class in a method (used in, for example, a
__repr__
):Using special attributes to write custom classes or subclassed builtins
When we want to define custom behavior, we must use the data-model names.
This makes sense, since we are the implementors, these attributes aren't private to us.
However, even in this case, we don't use
self.value.__eq__(other.value)
ornot self.__eq__(other)
(see my answer here for proof that the latter can lead to unexpected behavior.) Instead, we should use the higher level of abstraction.Another point at which we'd need to use the special method names is when we are in a child's implementation, and want to delegate to the parent. For example:
Conclusion
The special methods allow users to implement the interface for object internals.
Use the builtin functions and operators wherever you can. Only use the special methods where there is no documented public API.
I'll show some usage that you apparently didn't think of, comment on the examples you showed, and argue against the privacy claim from your own answer.
I agree with your own answer that for example
len(a)
should be used, nota.__len__()
. I'd put it like this:len
exists so we can use it, and__len__
exists solen
can use it. Or however that really works internally, sincelen(a)
can actually be much faster, at least for example for lists and strings:But besides defining these methods in my own classes for usage by builtin functions and operators, I occasionally also use them as follows:
Let's say I need to give a filter function to some function and I want to use a set
s
as the filter. I'm not going to create an extra functionlambda x: x in s
ordef f(x): return x in s
. No. I already have a perfectly fine function that I can use: the set's__contains__
method. It's simpler and more direct. And even faster, as shown here (ignore that I save it asf
here, that's just for this timing demo):So while I don't directly call magic methods like
s.__contains__(x)
, I do occasionally pass them somewhere likesome_function_needing_a_filter(s.__contains__)
. And I think that's perfectly fine, and better than the lambda/def alternative.My thoughts on the examples you showed:
items.__len__()
. Even without any reasoning. My verdict: That's just wrong. Should belen(items)
.d[key] = value
first! And then addsd.__setitem__(key, value)
with the reasoning "if your keyboard is missing the square bracket keys", which rarely applies and which I doubt was serious. I think it was just the foot in the door for the last point, mentioning that that's how we can support the square bracket syntax in our own classes. Which turns it back to a suggestion to use square brackets.obj.__dict__
. Bad, like the__len__
example. But I suspect he just didn't knowvars(obj)
, and I can understand it, asvars
is less common/known and the name does differ from the "dict" in__dict__
.__class__
. Should betype(obj)
. I suspect it's similar to the__dict__
story, although I thinktype
is more well-known.About privacy: In your own answer you say these methods are "semantically private". I strongly disagree. Single and double leading underscores are for that, but not the data model's special "dunder/magic" methods with double leading+trailing underscores.
_foo
and__bar__
and then autocompletion didn't offer_foo
but did offer__bar__
. And when I used both methods anyway, PyCharm only warned me about_foo
(calling it a "protected member"), not about__bar__
.I also recommend reading this article by Andrew Montalenti about these methods, emphasizing that "The dunder convention is a namespace reserved for the core Python team" and "Never, ever, invent your own dunders" because "The core Python team reserved a somewhat ugly namespace for themselves". Which all matches PEP 8's instruction "Never invent [dunder/magic] names; only use them as documented". I think Andrew is spot on - it's just an ugly namespace of the core team. And it's for the purpose of operator overloading, not about privacy (not Andrew's point but mine and the data model page's).
Besides Andrew's article I also checked several more about these "magic"/"dunder" methods, and I found none of them talking about privacy at all. That's just not what this is about.
Again, we should use
len(a)
, nota.__len__()
. But not because of privacy.