Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:
def foo(a=[]):
a.append(5)
return a
Python novices would expect this function to always return a list with only one element: [5]
. The result is instead very different, and very astonishing (for a novice):
>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()
A manager of mine once had his first encounter with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)
Edit:
Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:
>>> def a():
... print("a executed")
... return []
...
>>>
>>> def b(x=a()):
... x.append(5)
... print(x)
...
a executed
>>> b()
[5]
>>> b()
[5, 5]
To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function or "together" with it?
Doing the binding inside the function would mean that x
is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def
line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.
The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.
This "bug" gave me a lot of overtime work hours! But I'm beginning to see a potential use of it (but I would have liked it to be at the execution time, still)
I'm gonna give you what I see as a useful example.
prints the following
I sometimes exploit this behavior as an alternative to the following pattern:
If
singleton
is only used byuse_singleton
, I like the following pattern as a replacement:I've used this for instantiating client classes that access external resources, and also for creating dicts or lists for memoization.
Since I don't think this pattern is well known, I do put a short comment in to guard against future misunderstandings.
Suppose you have the following code
When I see the declaration of eat, the least astonishing thing is to think that if the first parameter is not given, that it will be equal to the tuple
("apples", "bananas", "loganberries")
However, supposed later on in the code, I do something like
then if default parameters were bound at function execution rather than function declaration then I would be astonished (in a very bad way) to discover that fruits had been changed. This would be more astonishing IMO than discovering that your
foo
function above was mutating the list.The real problem lies with mutable variables, and all languages have this problem to some extent. Here's a question: suppose in Java I have the following code:
Now, does my map use the value of the
StringBuffer
key when it was placed into the map, or does it store the key by reference? Either way, someone is astonished; either the person who tried to get the object out of theMap
using a value identical to the one they put it in with, or the person who can't seem to retrieve their object even though the key they're using is literally the same object that was used to put it into the map (this is actually why Python doesn't allow its mutable built-in data types to be used as dictionary keys).Your example is a good one of a case where Python newcomers will be surprised and bitten. But I'd argue that if we "fixed" this, then that would only create a different situation where they'd be bitten instead, and that one would be even less intuitive. Moreover, this is always the case when dealing with mutable variables; you always run into cases where someone could intuitively expect one or the opposite behavior depending on what code they're writing.
I personally like Python's current approach: default function arguments are evaluated when the function is defined and that object is always the default. I suppose they could special-case using an empty list, but that kind of special casing would cause even more astonishment, not to mention be backwards incompatible.
Well, the reason is quite simply that bindings are done when code is executed, and the function definition is executed, well... when the functions is defined.
Compare this:
This code suffers from the exact same unexpected happenstance. bananas is a class attribute, and hence, when you add things to it, it's added to all instances of that class. The reason is exactly the same.
It's just "How It Works", and making it work differently in the function case would probably be complicated, and in the class case likely impossible, or at least slow down object instantiation a lot, as you would have to keep the class code around and execute it when objects are created.
Yes, it is unexpected. But once the penny drops, it fits in perfectly with how Python works in general. In fact, it's a good teaching aid, and once you understand why this happens, you'll grok python much better.
That said it should feature prominently in any good Python tutorial. Because as you mention, everyone runs into this problem sooner or later.
The solutions here are:
None
as your default value (or a nonceobject
), and switch on that to create your values at runtime; orlambda
as your default parameter, and call it within a try block to get the default value (this is the sort of thing that lambda abstraction is for).The second option is nice because users of the function can pass in a callable, which may be already existing (such as a
type
)This is not a design flaw. Anyone who trips over this is doing something wrong.
There are 3 cases I see where you might run into this problem:
cache={}
, and you wouldn't be expected to call the function with an actual argument at all.The example in the question could fall into category 1 or 3. It's odd that it both modifies the passed list and returns it; you should pick one or the other.