Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:
def foo(a=[]):
a.append(5)
return a
Python novices would expect this function to always return a list with only one element: [5]
. The result is instead very different, and very astonishing (for a novice):
>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()
A manager of mine once had his first encounter with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)
Edit:
Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:
>>> def a():
... print("a executed")
... return []
...
>>>
>>> def b(x=a()):
... x.append(5)
... print(x)
...
a executed
>>> b()
[5]
>>> b()
[5, 5]
To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function or "together" with it?
Doing the binding inside the function would mean that x
is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def
line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.
The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.
This behavior is not surprising if you take the following into consideration:
The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not "intuitive" when coming from other languages.
(1) is described in the Python tutorial on classes. In an attempt to assign a value to a read-only class attribute:
Look back to the original example and consider the above points:
Here
foo
is an object anda
is an attribute offoo
(available atfoo.func_defs[0]
). Sincea
is a list,a
is mutable and is thus a read-write attribute offoo
. It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.Calling
foo
without overriding a default uses that default's value fromfoo.func_defs
. In this case,foo.func_defs[0]
is used fora
within function object's code scope. Changes toa
changefoo.func_defs[0]
, which is part of thefoo
object and persists between execution of the code infoo
.Now, compare this to the example from the documentation on emulating the default argument behavior of other languages, such that the function signature defaults are used every time the function is executed:
Taking (1) and (2) into account, one can see why this accomplishes the the desired behavior:
foo
function object is instantiated,foo.func_defs[0]
is set toNone
, an immutable object.L
in the function call),foo.func_defs[0]
(None
) is available in the local scope asL
.L = []
, the assignment cannot succeed atfoo.func_defs[0]
, because that attribute is read-only.L
is created in the local scope and used for the remainder of the function call.foo.func_defs[0]
thus remains unchanged for future invocations offoo
.Every other answer explains why this is actually a nice and desired behavior, or why you shouldn't be needing this anyway. Mine is for those stubborn ones who want to exercise their right to bend the language to their will, not the other way around.
We will "fix" this behavior with a decorator that will copy the default value instead of reusing the same instance for each positional argument left at its default value.
Now let's redefine our function using this decorator:
This is particularly neat for functions that take multiple arguments. Compare:
with
It's important to note that the above solution breaks if you try to use keyword args, like so:
The decorator could be adjusted to allow for that, but we leave this as an exercise for the reader ;)
Actually, this is not a design flaw, and it is not because of internals, or performance.
It comes simply from the fact that functions in Python are first-class objects, and not only a piece of code.
As soon as you get to think into this way, then it completely makes sense: a function is an object being evaluated on its definition; default parameters are kind of "member data" and therefore their state may change from one call to the other - exactly as in any other object.
In any case, Effbot has a very nice explanation of the reasons for this behavior in Default Parameter Values in Python.
I found it very clear, and I really suggest reading it for a better knowledge of how function objects work.
This "bug" gave me a lot of overtime work hours! But I'm beginning to see a potential use of it (but I would have liked it to be at the execution time, still)
I'm gonna give you what I see as a useful example.
prints the following
Suppose you have the following code
When I see the declaration of eat, the least astonishing thing is to think that if the first parameter is not given, that it will be equal to the tuple
("apples", "bananas", "loganberries")
However, supposed later on in the code, I do something like
then if default parameters were bound at function execution rather than function declaration then I would be astonished (in a very bad way) to discover that fruits had been changed. This would be more astonishing IMO than discovering that your
foo
function above was mutating the list.The real problem lies with mutable variables, and all languages have this problem to some extent. Here's a question: suppose in Java I have the following code:
Now, does my map use the value of the
StringBuffer
key when it was placed into the map, or does it store the key by reference? Either way, someone is astonished; either the person who tried to get the object out of theMap
using a value identical to the one they put it in with, or the person who can't seem to retrieve their object even though the key they're using is literally the same object that was used to put it into the map (this is actually why Python doesn't allow its mutable built-in data types to be used as dictionary keys).Your example is a good one of a case where Python newcomers will be surprised and bitten. But I'd argue that if we "fixed" this, then that would only create a different situation where they'd be bitten instead, and that one would be even less intuitive. Moreover, this is always the case when dealing with mutable variables; you always run into cases where someone could intuitively expect one or the opposite behavior depending on what code they're writing.
I personally like Python's current approach: default function arguments are evaluated when the function is defined and that object is always the default. I suppose they could special-case using an empty list, but that kind of special casing would cause even more astonishment, not to mention be backwards incompatible.
I know nothing about the Python interpreter inner workings (and I'm not an expert in compilers and interpreters either) so don't blame me if I propose anything unsensible or impossible.
Provided that python objects are mutable I think that this should be taken into account when designing the default arguments stuff. When you instantiate a list:
you expect to get a new list referenced by
a
.Why should the
a=[]
ininstantiate a new list on function definition and not on invocation? It's just like you're asking "if the user doesn't provide the argument then instantiate a new list and use it as if it was produced by the caller". I think this is ambiguous instead:
user, do you want
a
to default to the datetime corresponding to when you're defining or executingx
? In this case, as in the previous one, I'll keep the same behaviour as if the default argument "assignment" was the first instruction of the function (datetime.now()
called on function invocation). On the other hand, if the user wanted the definition-time mapping he could write:I know, I know: that's a closure. Alternatively Python might provide a keyword to force definition-time binding: