Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:
def foo(a=[]):
a.append(5)
return a
Python novices would expect this function to always return a list with only one element: [5]
. The result is instead very different, and very astonishing (for a novice):
>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()
A manager of mine once had his first encounter with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)
Edit:
Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:
>>> def a():
... print("a executed")
... return []
...
>>>
>>> def b(x=a()):
... x.append(5)
... print(x)
...
a executed
>>> b()
[5]
>>> b()
[5, 5]
To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function or "together" with it?
Doing the binding inside the function would mean that x
is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def
line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.
The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.
Python: The Mutable Default Argument
Default arguments get evaluated at the time the function is compiled into a function object. When used by the function, multiple times by that function, they are and remain the same object.
When they are mutable, when mutated (for example, by adding an element to it) they remain mutated on consecutive calls.
They stay mutated because they are the same object each time.
Equivalent code:
Since the list is bound to the function when the function object is compiled and instantiated, this:
is almost exactly equivalent to this:
Demonstration
Here's a demonstration - you can verify that they are the same object each time they are referenced by
example.py
and running it with
python example.py
:Does this violate the principle of "Least Astonishment"?
This order of execution is frequently confusing to new users of Python. If you understand the Python execution model, then it becomes quite expected.
The usual instruction to new Python users:
But this is why the usual instruction to new users is to create their default arguments like this instead:
This uses the None singleton as a sentinel object to tell the function whether or not we've gotten an argument other than the default. If we get no argument, then we actually want to use a new empty list,
[]
, as the default.As the tutorial section on control flow says:
What you're asking is why this:
isn't internally equivalent to this:
except for the case of explicitly calling func(None, None), which we'll ignore.
In other words, instead of evaluating default parameters, why not store each of them, and evaluate them when the function is called?
One answer is probably right there--it would effectively turn every function with default parameters into a closure. Even if it's all hidden away in the interpreter and not a full-blown closure, the data's got to be stored somewhere. It'd be slower and use more memory.
I think the answer to this question lies in how python pass data to parameter (pass by value or by reference), not mutability or how python handle the "def" statement.
A brief introduction. First, there are two type of data types in python, one is simple elementary data type, like numbers, and another data type is objects. Second, when passing data to parameters, python pass elementary data type by value, i.e., make a local copy of the value to a local variable, but pass object by reference, i.e., pointers to the object.
Admitting the above two points, let's explain what happened to the python code. It's only because of passing by reference for objects, but has nothing to do with mutable/immutable, or arguably the fact that "def" statement is executed only once when it is defined.
[] is an object, so python pass the reference of [] to
a
, i.e.,a
is only a pointer to [] which lies in memory as an object. There is only one copy of [] with, however, many references to it. For the first foo(), the list [] is changed to 1 by append method. But Note that there is only one copy of the list object and this object now becomes 1. When running the second foo(), what effbot webpage says (items is not evaluated any more) is wrong.a
is evaluated to be the list object, although now the content of the object is 1. This is the effect of passing by reference! The result of foo(3) can be easily derived in the same way.To further validate my answer, let's take a look at two additional codes.
====== No. 2 ========
[]
is an object, so isNone
(the former is mutable while the latter is immutable. But the mutability has nothing to do with the question). None is somewhere in the space but we know it's there and there is only one copy of None there. So every time foo is invoked, items is evaluated (as opposed to some answer that it is only evaluated once) to be None, to be clear, the reference (or the address) of None. Then in the foo, item is changed to [], i.e., points to another object which has a different address.====== No. 3 =======
The invocation of foo(1) make items point to a list object [] with an address, say, 11111111. the content of the list is changed to 1 in the foo function in the sequel, but the address is not changed, still 11111111. Then foo(2,[]) is coming. Although the [] in foo(2,[]) has the same content as the default parameter [] when calling foo(1), their address are different! Since we provide the parameter explicitly,
items
has to take the address of this new[]
, say 2222222, and return it after making some change. Now foo(3) is executed. since onlyx
is provided, items has to take its default value again. What's the default value? It is set when defining the foo function: the list object located in 11111111. So the items is evaluated to be the address 11111111 having an element 1. The list located at 2222222 also contains one element 2, but it is not pointed by items any more. Consequently, An append of 3 will makeitems
[1,3].From the above explanations, we can see that the effbot webpage recommended in the accepted answer failed to give a relevant answer to this question. What is more, I think a point in the effbot webpage is wrong. I think the code regarding the UI.Button is correct:
Each button can hold a distinct callback function which will display different value of
i
. I can provide an example to show this:If we execute
x[7]()
we'll get 7 as expected, andx[9]()
will gives 9, another value ofi
.Already busy topic, but from what I read here, the following helped me realizing how it's working internally:
TLDR: Define-time defaults are consistent and strictly more expressive.
Defining a function affects two scopes: the defining scope containing the function, and the execution scope contained by the function. While it is pretty clear how blocks map to scopes, the question is where
def <name>(<args=defaults>):
belongs to:The
def name
part must evaluate in the defining scope - we wantname
to be available there, after all. Evaluating the function only inside itself would make it inaccessible.Since
parameter
is a constant name, we can "evaluate" it at the same time asdef name
. This also has the advantage it produces the function with a known signature asname(parameter=...):
, instead of a barename(...):
.Now, when to evaluate
default
?Consistency already says "at definition": everything else of
def <name>(<args=defaults>):
is best evaluated at definition as well. Delaying parts of it would be the astonishing choice.The two choices are not equivalent, either: If
default
is evaluated at definition time, it can still affect execution time. Ifdefault
is evaluated at execution time, it cannot affect definition time. Choosing "at definition" allows expressing both cases, while choosing "at execution" can express only one:Well, the reason is quite simply that bindings are done when code is executed, and the function definition is executed, well... when the functions is defined.
Compare this:
This code suffers from the exact same unexpected happenstance. bananas is a class attribute, and hence, when you add things to it, it's added to all instances of that class. The reason is exactly the same.
It's just "How It Works", and making it work differently in the function case would probably be complicated, and in the class case likely impossible, or at least slow down object instantiation a lot, as you would have to keep the class code around and execute it when objects are created.
Yes, it is unexpected. But once the penny drops, it fits in perfectly with how Python works in general. In fact, it's a good teaching aid, and once you understand why this happens, you'll grok python much better.
That said it should feature prominently in any good Python tutorial. Because as you mention, everyone runs into this problem sooner or later.