Should I create a copy when iterating over a list

2019-08-09 06:06发布

问题:

When answering this question, I came across something I never thought about in Python (pointed by a user).

Basically, I already know (here's an interesting thread about it) that I have to make a copy when iterating while mutating a list in Python in order to avoid strange behaviors.

Now, my question is, is using enumerate overcoming that problem ?

test_list = [1,2,3,4]
for index,item in enumerate(test_list):
    if item == 1:
        test_list.pop(index)

Would this code be considered safe or I should use,

for index,item in enumerate(test_list[:]):

回答1:

First, let’s answer your direct question:

enumerate doesn’t help anything here. It works as if it held an iterator to the underlying iterable (and, at least in CPython, that’s exactly what it does), so anything that wouldn’t be legal or safe to do with a list iterator isn’t legal or safe to do with an enumerate object wrapped around that list iterator.


Your original use case—setting test_list[index] = new_value—is safe in practice—but I’m not sure whether it’s guaranteed to be safe or not.

Your new use case—calling test_list.pop(index)—is probably not safe.


The most obvious implementation of a list iterator is basically just a reference to the list and an index into that list. So, if you insert or delete at the current position, or to the left of that position, you’re definitely breaking the iterator. For example, if you delete lst[i], that shifts everything from i + 1 to the end up one position, so when you move on to i + 1, you’re skipping over the original i + 1th value, because it’s now the ith. But if you insert or delete to the right of the current position, that’s not a problem.

Since test_list.pop(index) deletes at or left of the current position, it's not safe even with this implementation. (Of course if you've carefully written your algorithm so that skipping over the value after a hit never matters, maybe even that's fine. But more algorithms won't handle that.)

It’s conceivable that a Python implementation could instead store a raw pointer to the current position in the array used for the list storage. Which would mean that inserting anywhere could break the iterator, because an insert can cause the whole list to get reallocated to new memory. And so could deleting anywhere, if the implementation sometimes reallocates lists on shrinking. I don't think the Python disallows implementations that do all of this, so if you want to be paranoid, it may be safer to just never insert or delete while iterating.

If you’re just replacing an existing value, it’s hard to imagine how that could break the iterator under any reasonable implementation. But, as far as I'm aware, the language reference and list library reference1 don't actually make any promises about the implementation of list iterators.2

So, it's up to you whether you care about "safe in my implementation", "safe in every implementation every written to date", "safe in every conceivable (to me) implementation", or "guaranteed safe by the reference".

I think most people happily replace list items during iteration, but avoid shrinking or growing the list. However, there's definitely production code out there that at least deletes to the right of the iterator.


1. I believe the tutorial just says somewhere to never modify any data structure while iterating over it—but that’s the tutorial. It’s certainly safe to always follow that rule, but it may also be safe to follow a less strict rule.

2. Except that if the key function or anything else tries to access the list in any way in the middle of a sort, the result is undefined.



回答2:

Since it was my comment which lead to this, I'll add my follow up:

enumerate can be thought of as a generator so it will just take a sequence (any sequence or iterator actually) and just "generate" an incrementing index to be yielded with each item from the passed sequence (so it's not making a copy or mutating the list anyway just using an "enumerate object").

In the case of the code in that question, you were never changing the length of the list you were iterating over and once the if statement was run the value of the element did not matter. So the copy wasn't needed, it will be needed when an element is removed as the iterator index is shared with the list and does not account for removed elements.

The Python Ninja has a great example of when you should use a copy (or move to list comprehension)