Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:
def foo(a=[]):
a.append(5)
return a
Python novices would expect this function to always return a list with only one element: [5]
. The result is instead very different, and very astonishing (for a novice):
>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()
A manager of mine once had his first encounter with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)
Edit:
Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:
>>> def a():
... print("a executed")
... return []
...
>>>
>>> def b(x=a()):
... x.append(5)
... print(x)
...
a executed
>>> b()
[5]
>>> b()
[5, 5]
To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function or "together" with it?
Doing the binding inside the function would mean that x
is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def
line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.
The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.
Why don't you introspect?
I'm really surprised no one has performed the insightful introspection offered by Python (
2
and3
apply) on callables.Given a simple little function
func
defined as:When Python encounters it, the first thing it will do is compile it in order to create a
code
object for this function. While this compilation step is done, Python evaluates* and then stores the default arguments (an empty list[]
here) in the function object itself. As the top answer mentioned: the lista
can now be considered a member of the functionfunc
.So, let's do some introspection, a before and after to examine how the list gets expanded inside the function object. I'm using
Python 3.x
for this, for Python 2 the same applies (use__defaults__
orfunc_defaults
in Python 2; yes, two names for the same thing).Function Before Execution:
After Python executes this definition it will take any default parameters specified (
a = []
here) and cram them in the__defaults__
attribute for the function object (relevant section: Callables):O.k, so an empty list as the single entry in
__defaults__
, just as expected.Function After Execution:
Let's now execute this function:
Now, let's see those
__defaults__
again:Astonished? The value inside the object changes! Consecutive calls to the function will now simply append to that embedded
list
object:So, there you have it, the reason why this 'flaw' happens, is because default arguments are part of the function object. There's nothing weird going on here, it's all just a bit surprising.
The common solution to combat this is to use
None
as the default and then initialize in the function body:Since the function body is executed anew each time, you always get a fresh new empty list if no argument was passed for
a
.To further verify that the list in
__defaults__
is the same as that used in the functionfunc
you can just change your function to return theid
of the lista
used inside the function body. Then, compare it to the list in__defaults__
(position[0]
in__defaults__
) and you'll see how these are indeed refering to the same list instance:All with the power of introspection!
* To verify that Python evaluates the default arguments during compilation of the function, try executing the following:
as you'll notice,
input()
is called before the process of building the function and binding it to the namebar
is made.Python: The Mutable Default Argument
Default arguments get evaluated at the time the function is compiled into a function object. When used by the function, multiple times by that function, they are and remain the same object.
When they are mutable, when mutated (for example, by adding an element to it) they remain mutated on consecutive calls.
They stay mutated because they are the same object each time.
Equivalent code:
Since the list is bound to the function when the function object is compiled and instantiated, this:
is almost exactly equivalent to this:
Demonstration
Here's a demonstration - you can verify that they are the same object each time they are referenced by
example.py
and running it with
python example.py
:Does this violate the principle of "Least Astonishment"?
This order of execution is frequently confusing to new users of Python. If you understand the Python execution model, then it becomes quite expected.
The usual instruction to new Python users:
But this is why the usual instruction to new users is to create their default arguments like this instead:
This uses the None singleton as a sentinel object to tell the function whether or not we've gotten an argument other than the default. If we get no argument, then we actually want to use a new empty list,
[]
, as the default.As the tutorial section on control flow says:
I know nothing about the Python interpreter inner workings (and I'm not an expert in compilers and interpreters either) so don't blame me if I propose anything unsensible or impossible.
Provided that python objects are mutable I think that this should be taken into account when designing the default arguments stuff. When you instantiate a list:
you expect to get a new list referenced by
a
.Why should the
a=[]
ininstantiate a new list on function definition and not on invocation? It's just like you're asking "if the user doesn't provide the argument then instantiate a new list and use it as if it was produced by the caller". I think this is ambiguous instead:
user, do you want
a
to default to the datetime corresponding to when you're defining or executingx
? In this case, as in the previous one, I'll keep the same behaviour as if the default argument "assignment" was the first instruction of the function (datetime.now()
called on function invocation). On the other hand, if the user wanted the definition-time mapping he could write:I know, I know: that's a closure. Alternatively Python might provide a keyword to force definition-time binding:
I used to think that creating the objects at runtime would be the better approach. I'm less certain now, since you do lose some useful features, though it may be worth it regardless simply to prevent newbie confusion. The disadvantages of doing so are:
1. Performance
If call-time evaluation is used, then the expensive function is called every time your function is used without an argument. You'd either pay an expensive price on each call, or need to manually cache the value externally, polluting your namespace and adding verbosity.
2. Forcing bound parameters
A useful trick is to bind parameters of a lambda to the current binding of a variable when the lambda is created. For example:
This returns a list of functions that return 0,1,2,3... respectively. If the behaviour is changed, they will instead bind
i
to the call-time value of i, so you would get a list of functions that all returned9
.The only way to implement this otherwise would be to create a further closure with the i bound, ie:
3. Introspection
Consider the code:
We can get information about the arguments and defaults using the
inspect
module, whichThis information is very useful for things like document generation, metaprogramming, decorators etc.
Now, suppose the behaviour of defaults could be changed so that this is the equivalent of:
However, we've lost the ability to introspect, and see what the default arguments are. Because the objects haven't been constructed, we can't ever get hold of them without actually calling the function. The best we could do is to store off the source code and return that as a string.
Already busy topic, but from what I read here, the following helped me realizing how it's working internally:
This behavior is easy explained by:
So:
a
doesn't change - every assignment call creates new int object - new object is printedb
doesn't change - new array is build from default value and printedc
changes - operation is performed on same object - and it is printed