The following code:
a = list(range(10))
remove = False
for b in a:
if remove:
a.remove(b)
remove = not remove
print(a)
Outputs [0, 2, 3, 5, 6, 8, 9]
, instead of [0, 2, 4, 6, 8]
when using Python 3.2.
- Why does it output these particular values?
- Why is no error given to indicate that underlying iterator is being modified?
- Have the mechanics changed from earlier versions of Python with respect to this behaviour?
Note that I am not looking to work around the behaviour, but to understand it.
I debated answering this for a while, because similar questions have been asked many times here. But it\'s just unique enough to be given the benefit of the doubt. (Still, I won\'t object if others vote to close.) Here\'s a visual explanation of what is happening.
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] <- b = 0; remove? no
^
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] <- b = 1; remove? yes
^
[0, 2, 3, 4, 5, 6, 7, 8, 9] <- b = 3; remove? no
^
[0, 2, 3, 4, 5, 6, 7, 8, 9] <- b = 4; remove? yes
^
[0, 2, 3, 5, 6, 7, 8, 9] <- b = 6; remove? no
^
[0, 2, 3, 5, 6, 7, 8, 9] <- b = 7; remove? yes
^
[0, 2, 3, 5, 6, 8, 9] <- b = 9; remove? no
^
Since no one else has, I\'ll attempt to answer your other questions:
Why is no error given to indicate that underlying iterator is being modified?
To throw an error without prohibiting many perfectly valid loop constructions, Python would have to know a lot about what\'s going on, and it would probably have to get that information at runtime. All that information would take time to process. It would make Python a lot slower, in just the place where speed really counts -- a loop.
Have the mechanics changed from earlier versions of Python with respect to this behaviour?
In short, no. Or at least I highly doubt it, and certainly it has behaved this way since I learned Python (2.4). Frankly I would expect any straightforward implementation of a mutable sequence to behave in just this way. Anyone who knows better, please correct me. (Actually, a quick doc lookup confirms that the text that Mikola cited has been in the tutorial since version 1.4!)
As Mikola explained, the actual result you observe is caused by the fact that deleting an entry from the list shifts the whole list over by one spot causing you to miss elements.
But the more interesting question, to my mind, is why python doesn\'t elect to produce an error message when this happens. It does produce such an error message if you try to modify a dictionary. I think there are two reasons for that.
Dict are complex internally, whereas lists are not. Lists are basically just arrays. A dict has to detect when its modified while being iterated over so as to avoid crashing when the internal structure of the dict changes. A list can get away without doing that check because it just makes sure that its current index is still in range.
Historically, (I\'m not sure about now), python lists were iterated over by using the [] operator. Python would evaluate list[0], list[1], list[2] until it got an IndexError. In that case, python wasn\'t track the size of the list before it began so it had no method of detecting that the size of list had been changed.
Of course it is not safe to modify an array as you are iterating over it. The spec says it is a bad idea and the behavior is undefined:
http://docs.python.org/tutorial/controlflow.html#for-statements
So, the next question is what exactly is happening under the hood here? If I had to guess, I would say that it is doing something like this:
for(int i=0; i<len(array); ++i)
{
do_loop_body(i);
}
If you suppose that this is indeed what is going on, then it explains the observed behavior completely. When you remove an element at or before the current pointer, you shift the whole list by 1 to the left. The first time, you remove a 1 -- like usual -- but now the list shifts backwards. The next iteration instead of hitting a 2, you hit a 3. Then you remove a 4, and the list shifts backwards. Next iteration 7, and so on.
On your first iteration, you\'re not removing and everything\'s dandy.
Second iteration you\'re at position [1] of the sequence, and you remove \'1\'. The iterator then takes you to position [2] in the sequence, which is now \'3\', so \'2\' gets skipped over (as \'2\' is now at position [1] because of the removal). Of course \'3\' doesn\'t get removed, so you go on to position [3] in the sequence, which is now \'4\'. That gets removed, taking you to position [5] which is now \'6\', and so on.
The fact that you\'re removing things means that a position gets skipped over every time you perform a removal.