可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
This is more of a conceptual question. I recently saw a piece of code in Python (it worked in 2.7, and it might also have been run in 2.5 as well) in which a for
loop used the same name for both the list that was being iterated over and the item in the list, which strikes me as both bad practice and something that should not work at all.
For example:
x = [1,2,3,4,5]
for x in x:
print x
print x
Yields:
1
2
3
4
5
5
Now, it makes sense to me that the last value printed would be the last value assigned to x from the loop, but I fail to understand why you'd be able to use the same variable name for both your parts of the for
loop and have it function as intended. Are they in different scopes? What's going on under the hood that allows something like this to work?
回答1:
What does dis
tell us:
Python 3.4.1 (default, May 19 2014, 13:10:29)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from dis import dis
>>> dis("""x = [1,2,3,4,5]
... for x in x:
... print(x)
... print(x)""")
1 0 LOAD_CONST 0 (1)
3 LOAD_CONST 1 (2)
6 LOAD_CONST 2 (3)
9 LOAD_CONST 3 (4)
12 LOAD_CONST 4 (5)
15 BUILD_LIST 5
18 STORE_NAME 0 (x)
2 21 SETUP_LOOP 24 (to 48)
24 LOAD_NAME 0 (x)
27 GET_ITER
>> 28 FOR_ITER 16 (to 47)
31 STORE_NAME 0 (x)
3 34 LOAD_NAME 1 (print)
37 LOAD_NAME 0 (x)
40 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
43 POP_TOP
44 JUMP_ABSOLUTE 28
>> 47 POP_BLOCK
4 >> 48 LOAD_NAME 1 (print)
51 LOAD_NAME 0 (x)
54 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
57 POP_TOP
58 LOAD_CONST 5 (None)
61 RETURN_VALUE
The key bits are sections 2 and 3 - we load the value out of x
(24 LOAD_NAME 0 (x)
) and then we get its iterator (27 GET_ITER
) and start iterating over it (28 FOR_ITER
). Python never goes back to load the iterator again.
Aside: It wouldn't make any sense to do so, since it already has the iterator, and as Abhijit points out in his answer, Section 7.3 of Python's specification actually requires this behavior).
When the name x
gets overwritten to point at each value inside of the list formerly known as x
Python doesn't have any problems finding the iterator because it never needs to look at the name x
again to finish the iteration protocol.
回答2:
Using your example code as the core reference
x = [1,2,3,4,5]
for x in x:
print x
print x
I would like you to refer the section 7.3. The for statement in the manual
Excerpt 1
The expression list is evaluated once; it should yield an iterable
object. An iterator is created for the result of the expression_list.
What it means is that your variable x
, which is a symbolic name of an object list
: [1,2,3,4,5]
is evaluated to an iterable object. Even if the variable, the symbolic reference changes its allegiance, as the expression-list is not evaluated again, there is no impact to the iterable object that has already been evaluated and generated.
Note
- Everything in Python is an Object, has an Identifier, attributes and methods.
- Variables are Symbolic name, a reference to one and only one object at any given instance.
- Variables at run-time can change its allegiance i.e. can refer to some other object.
Excerpt 2
The suite is then executed once for each item provided by the
iterator, in the order of ascending indices.
Here the suite refers to the iterator and not to the expression-list. So, for each iteration, the iterator is executed to yield the next item instead of referring to the original expression-list.
回答3:
It is necessary for it to work this way, if you think about it. The expression for the sequence of a for
loop could be anything:
binaryfile = open("file", "rb")
for byte in binaryfile.read(5):
...
We can't query the sequence on each pass through the loop, or here we'd end up reading from the next batch of 5 bytes the second time. Naturally Python must in some way store the result of the expression privately before the loop begins.
Are they in different scopes?
No. To confirm this you could keep a reference to the original scope dictionary (locals()) and notice that you are in fact using the same variables inside the loop:
x = [1,2,3,4,5]
loc = locals()
for x in x:
print locals() is loc # True
print loc["x"] # 1
break
What's going on under the hood that allows something like this to
work?
Sean Vieira showed exactly what is going on under the hood, but to describe it in more readable python code, your for
loop is essentially equivalent to this while
loop:
it = iter(x)
while True:
try:
x = it.next()
except StopIteration:
break
print x
This is different from the traditional indexing approach to iteration you would see in older versions of Java, for example:
for (int index = 0; index < x.length; index++) {
x = x[index];
...
}
This approach would fail when the item variable and the sequence variable are the same, because the sequence x
would no longer be available to look up the next index after the first time x
was reassigned to the first item.
With the former approach, however, the first line (it = iter(x)
) requests an iterator object which is what is actually responsible for providing the next item from then on. The sequence that x
originally pointed to no longer needs to be accessed directly.
回答4:
It's the difference between a variable (x) and the object it points to (the list). When the for loop starts, Python grabs an internal reference to the object pointed to by x. It uses the object and not what x happens to reference at any given time.
If you reassign x, the for loop doesn't change. If x points to a mutable object (e.g., a list) and you change that object (e.g., delete an element) results can be unpredictable.
回答5:
Basically, the for loop takes in the list x
, and then, storing that as a temporary variable, reassigns a x
to each value in that temporary variable. Thus, x
is now the last value in the list.
>>> x = [1, 2, 3]
>>> [x for x in x]
[1, 2, 3]
>>> x
3
>>>
Just like in this:
>>> def foo(bar):
... return bar
...
>>> x = [1, 2, 3]
>>> for x in foo(x):
... print x
...
1
2
3
>>>
In this example, x
is stored in foo()
as bar
, so although x
is being reassigned, it still exist(ed) in foo()
so that we could use it to trigger our for
loop.
回答6:
x
no longer refers to the original x
list, and so there's no confusion. Basically, python remembers it's iterating over the original x
list, but as soon as you start assigning the iteration value (0,1,2, etc) to the name x
, it no longer refers to the original x
list. The name gets reassigned to the iteration value.
In [1]: x = range(5)
In [2]: x
Out[2]: [0, 1, 2, 3, 4]
In [3]: id(x)
Out[3]: 4371091680
In [4]: for x in x:
...: print id(x), x
...:
140470424504688 0
140470424504664 1
140470424504640 2
140470424504616 3
140470424504592 4
In [5]: id(x)
Out[5]: 140470424504592