“Least Astonishment” and the Mutable Default Argum

2019-09-09 22:08发布

Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:

def foo(a=[]):
    a.append(5)
    return a

Python novices would expect this function to always return a list with only one element: [5]. The result is instead very different, and very astonishing (for a novice):

>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()

A manager of mine once had his first encounter with this feature, and called it "a dramatic design flaw" of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don't understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)

Edit:

Baczek made an interesting example. Together with most of your comments and Utaal's in particular, I elaborated further:

>>> def a():
...     print("a executed")
...     return []
... 
>>>            
>>> def b(x=a()):
...     x.append(5)
...     print(x)
... 
a executed
>>> b()
[5]
>>> b()
[5, 5]

To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function or "together" with it?

Doing the binding inside the function would mean that x is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def line would be "hybrid" in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.

The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.

30条回答
祖国的老花朵
2楼-- · 2019-09-09 23:07

I am going to demonstrate an alternative structure to pass a default list value to a function (it works equally well with dictionaries).

As others have extensively commented, the list parameter is bound to the function when it is defined as opposed to when it is executed. Because lists and dictionaries are mutable, any alteration to this parameter will affect other calls to this function. As a result, subsequent calls to the function will receive this shared list which may have been altered by any other calls to the function. Worse yet, two parameters are using this function's shared parameter at the same time oblivious to the changes made by the other.

Wrong Method (probably...):

def foo(list_arg=[5]):
    return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
# The value of 6 appended to variable 'a' is now part of the list held by 'b'.
>>> b
[5, 6, 7]  

# Although 'a' is expecting to receive 6 (the last element it appended to the list),
# it actually receives the last element appended to the shared list.
# It thus receives the value 7 previously appended by 'b'.
>>> a.pop()             
7

You can verify that they are one and the same object by using id:

>>> id(a)
5347866528

>>> id(b)
5347866528

Per Brett Slatkin's "Effective Python: 59 Specific Ways to Write Better Python", Item 20: Use None and Docstrings to specify dynamic default arguments (p. 48)

The convention for achieving the desired result in Python is to provide a default value of None and to document the actual behaviour in the docstring.

This implementation ensures that each call to the function either receives the default list or else the list passed to the function.

Preferred Method:

def foo(list_arg=None):
   """
   :param list_arg:  A list of input values. 
                     If none provided, used a list with a default value of 5.
   """
   if not list_arg:
       list_arg = [5]
   return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
>>> b
[5, 7]

c = foo([10])
c.append(11)
>>> c
[10, 11]

There may be legitimate use cases for the 'Wrong Method' whereby the programmer intended the default list parameter to be shared, but this is more likely the exception than the rule.

查看更多
【Aperson】
3楼-- · 2019-09-09 23:07

You can get round this by replacing the object (and therefore the tie with the scope):

def foo(a=[]):
    a = list(a)
    a.append(5)
    return a

Ugly, but it works.

查看更多
贪生不怕死
4楼-- · 2019-09-09 23:09

This is not a design flaw. Anyone who trips over this is doing something wrong.

There are 3 cases I see where you might run into this problem:

  1. You intend to modify the argument as a side effect of the function. In this case it never makes sense to have a default argument. The only exception is when you're abusing the argument list to have function attributes, e.g. cache={}, and you wouldn't be expected to call the function with an actual argument at all.
  2. You intend to leave the argument unmodified, but you accidentally did modify it. That's a bug, fix it.
  3. You intend to modify the argument for use inside the function, but didn't expect the modification to be viewable outside of the function. In that case you need to make a copy of the argument, whether it was the default or not! Python is not a call-by-value language so it doesn't make the copy for you, you need to be explicit about it.

The example in the question could fall into category 1 or 3. It's odd that it both modifies the passed list and returns it; you should pick one or the other.

查看更多
【Aperson】
5楼-- · 2019-09-09 23:10

I know nothing about the Python interpreter inner workings (and I'm not an expert in compilers and interpreters either) so don't blame me if I propose anything unsensible or impossible.

Provided that python objects are mutable I think that this should be taken into account when designing the default arguments stuff. When you instantiate a list:

a = []

you expect to get a new list referenced by a.

Why should the a=[] in

def x(a=[]):

instantiate a new list on function definition and not on invocation? It's just like you're asking "if the user doesn't provide the argument then instantiate a new list and use it as if it was produced by the caller". I think this is ambiguous instead:

def x(a=datetime.datetime.now()):

user, do you want a to default to the datetime corresponding to when you're defining or executing x? In this case, as in the previous one, I'll keep the same behaviour as if the default argument "assignment" was the first instruction of the function (datetime.now() called on function invocation). On the other hand, if the user wanted the definition-time mapping he could write:

b = datetime.datetime.now()
def x(a=b):

I know, I know: that's a closure. Alternatively Python might provide a keyword to force definition-time binding:

def x(static a=b):
查看更多
够拽才男人
6楼-- · 2019-09-09 23:10

5 points in defense of Python

  1. Simplicity: The behavior is simple in the following sense: Most people fall into this trap only once, not several times.

  2. Consistency: Python always passes objects, not names. The default parameter is, obviously, part of the function heading (not the function body). It therefore ought to be evaluated at module load time (and only at module load time, unless nested), not at function call time.

  3. Usefulness: As Frederik Lundh points out in his explanation of "Default Parameter Values in Python", the current behavior can be quite useful for advanced programming. (Use sparingly.)

  4. Sufficient documentation: In the most basic Python documentation, the tutorial, the issue is loudly announced as an "Important warning" in the first subsection of Section "More on Defining Functions". The warning even uses boldface, which is rarely applied outside of headings. RTFM: Read the fine manual.

  5. Meta-learning: Falling into the trap is actually a very helpful moment (at least if you are a reflective learner), because you will subsequently better understand the point "Consistency" above and that will teach you a great deal about Python.

查看更多
干净又极端
7楼-- · 2019-09-09 23:10

When we do this:

def foo(a=[]):
    ...

... we assign the argument a to an unnamed list, if the caller does not pass the value of a.

To make things simpler for this discussion, let's temporarily give the unnamed list a name. How about pavlo ?

def foo(a=pavlo):
   ...

At any time, if the caller doesn't tell us what a is, we reuse pavlo.

If pavlo is mutable (modifiable), and foo ends up modifying it, an effect we notice the next time foo is called without specifying a.

So this is what you see (Remember, pavlo is initialized to []):

 >>> foo()
 [5]

Now, pavlo is [5].

Calling foo() again modifies pavlo again:

>>> foo()
[5, 5]

Specifying a when calling foo() ensures pavlo is not touched.

>>> ivan = [1, 2, 3, 4]
>>> foo(a=ivan)
[1, 2, 3, 4, 5]
>>> ivan
[1, 2, 3, 4, 5]

So, pavlo is still [5, 5].

>>> foo()
[5, 5, 5]
查看更多
登录 后发表回答