Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:
def foo(a=[]):
a.append(5)
return a
Python novices would expect this function to always return a list with only one element: [5]
. The result is instead very different, and very astonishing (for a novice):
>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()
A manager of mine once had his first encounter with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)
Edit:
Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:
>>> def a():
... print("a executed")
... return []
...
>>>
>>> def b(x=a()):
... x.append(5)
... print(x)
...
a executed
>>> b()
[5]
>>> b()
[5, 5]
To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function or "together" with it?
Doing the binding inside the function would mean that x
is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def
line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.
The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.
This behavior is easy explained by:
So:
a
doesn't change - every assignment call creates new int object - new object is printedb
doesn't change - new array is build from default value and printedc
changes - operation is performed on same object - and it is printedThis actually has nothing to do with default values, other than that it often comes up as an unexpected behaviour when you write functions with mutable default values.
No default values in sight in this code, but you get exactly the same problem.
The problem is that
foo
is modifying a mutable variable passed in from the caller, when the caller doesn't expect this. Code like this would be fine if the function was called something likeappend_5
; then the caller would be calling the function in order to modify the value they pass in, and the behaviour would be expected. But such a function would be very unlikely to take a default argument, and probably wouldn't return the list (since the caller already has a reference to that list; the one it just passed in).Your original
foo
, with a default argument, shouldn't be modifyinga
whether it was explicitly passed in or got the default value. Your code should leave mutable arguments alone unless it is clear from the context/name/documentation that the arguments are supposed to be modified. Using mutable values passed in as arguments as local temporaries is an extremely bad idea, whether we're in Python or not and whether there are default arguments involved or not.If you need to destructively manipulate a local temporary in the course of computing something, and you need to start your manipulation from an argument value, you need to make a copy.
This behavior is not surprising if you take the following into consideration:
The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not "intuitive" when coming from other languages.
(1) is described in the Python tutorial on classes. In an attempt to assign a value to a read-only class attribute:
Look back to the original example and consider the above points:
Here
foo
is an object anda
is an attribute offoo
(available atfoo.func_defs[0]
). Sincea
is a list,a
is mutable and is thus a read-write attribute offoo
. It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.Calling
foo
without overriding a default uses that default's value fromfoo.func_defs
. In this case,foo.func_defs[0]
is used fora
within function object's code scope. Changes toa
changefoo.func_defs[0]
, which is part of thefoo
object and persists between execution of the code infoo
.Now, compare this to the example from the documentation on emulating the default argument behavior of other languages, such that the function signature defaults are used every time the function is executed:
Taking (1) and (2) into account, one can see why this accomplishes the the desired behavior:
foo
function object is instantiated,foo.func_defs[0]
is set toNone
, an immutable object.L
in the function call),foo.func_defs[0]
(None
) is available in the local scope asL
.L = []
, the assignment cannot succeed atfoo.func_defs[0]
, because that attribute is read-only.L
is created in the local scope and used for the remainder of the function call.foo.func_defs[0]
thus remains unchanged for future invocations offoo
.Actually, this is not a design flaw, and it is not because of internals, or performance.
It comes simply from the fact that functions in Python are first-class objects, and not only a piece of code.
As soon as you get to think into this way, then it completely makes sense: a function is an object being evaluated on its definition; default parameters are kind of "member data" and therefore their state may change from one call to the other - exactly as in any other object.
In any case, Effbot has a very nice explanation of the reasons for this behavior in Default Parameter Values in Python.
I found it very clear, and I really suggest reading it for a better knowledge of how function objects work.
1) The so-called problem of "Mutable Default Argument" is in general a special example demonstrating that:
"All functions with this problem suffer also from similar side effect problem on the actual parameter,"
That is against the rules of functional programming, usually undesiderable and should be fixed both together.
Example:
Solution: a copy
An absolutely safe solution is to
copy
ordeepcopy
the input object first and then to do whatever with the copy.Many builtin mutable types have a copy method like
some_dict.copy()
orsome_set.copy()
or can be copied easy likesomelist[:]
orlist(some_list)
. Every object can be also copied bycopy.copy(any_object)
or more thorough bycopy.deepcopy()
(the latter useful if the mutable object is composed from mutable objects). Some objects are fundamentally based on side effects like "file" object and can not be meaningfully reproduced by copy. copyingExample problem for a similar SO question
It shouldn't be neither saved in any public attribute of an instance returned by this function. (Assuming that private attributes of instance should not be modified from outside of this class or subclasses by convention. i.e.
_var1
is a private attribute )Conclusion:
Input parameters objects shouldn't be modified in place (mutated) nor they should not be binded into an object returned by the function. (If we prefere programming without side effects which is strongly recommended. see Wiki about "side effect" (The first two paragraphs are relevent in this context.) .)
2)
Only if the side effect on the actual parameter is required but unwanted on the default parameter then the useful solution is
def ...(var1=None):
if var1 is None:
var1 = []
More..3) In some cases is the mutable behavior of default parameters useful.
Just change the function to be: