I have several instances in a python script (v2.6) where I need to modify a list in-place. I need to pop values from the list in response to interactive input from the user and would like to know the cleanest method of doing this. Currently I have the very dirty solutions of a) setting items in the list that I want to remove to False and removing them with a filter or list comprehension or b) generating an entirely new list while going through the loop, which seems to be needlessly adding variables to the namespace and taking up memory.
An example of this problem is as follows:
for i, folder in enumerate(to_run_folders):
if get_size(folder) < byte_threshold:
ans = raw_input(('The folder {0}/ is less than {1}MB.' + \
' Would you like to exclude it from' + \
' compression? ').format(folder, megabyte_threshold))
if 'y' in ans.strip().lower():
to_run_folders.pop(i)
I would like to look at each folder in the list. If the current folder is less than a certain size, I want to ask the user if they want to exclude it. If they do, pop the folder from the list.
The problem with this routine is that if I iterate over the list, I get unexpected behavior and early termination. If I iterate over a copy by slicing, pop doesn't pull the right value because the indices are shifted and the problem compounds as more items are popped. I have a need for dynamic list adjustment of this kind in other areas of my script as well. Is there any clean method for this kind of functionality?
You can loop over the list backwards, or use a view object.
See https://stackoverflow.com/a/181062/711085 for how to loop over a list backwards. Basically use
reversed(yourList)
(which happens creates a view object which visits backwards).If you require indexing, you could do
reversed(enumerate(yourList))
, but that would effectively create a temporary list in memory becauseenumerate
would need to run beforereversed
could kick in. You will need to either do index manipulation, or to do this:Even cleaner:
reversed
is aware ofrange
, so you can do this in python3, or in python2 if you usexrange
instead:(proof: you can do
next(reversed(range(10**10)))
, but this will crash your computer if using python2)OK, I have measured the solution. The reversed solutions are about the same. The forward
while
loop is about 4 times slower. BUT! The Patrik's dirty solution is about 80 times faster for the list of 100,000 random integers [bug in Patrik2 corrected]:It prints on my console:
So, I tried with longer list. Using only twice as long one makes a big difference (on my old computer). Patrik's dirty solution behaves very nicely. It is about 200 times faster than the reversed solutions:
[Added after ninjagecko's comments]
The corrected Patrik2 solution is about twice faster than 2-stage Patrick solution.
To simulate not so frequent deletion of the elements, the test like
if v % 2 != 0:
was changed toif v % 100 == 0:
. Then about 1 % of items should be deleted. It is obvious that it takes less time. For 500,000 random integers in the list, the results are the following:Patrick's solution is stil about 30 times faster.
[Added 2012/04/25]
Another solution that works in-place, that loops forward, and that is as fast as Patrick's solution. It does not move all the tail when the element is deleted. Instead, it moves the wanted elements to their final position and then cuts the unused tail of the list.
Compared with the above solutions for
v % 100 != 0
:You can loop over it backwards
Backwards:
The best way to handle this, the most "Pythonic" way, is in fact to loop over your list and make a new list that contains just the folders you want. Here is how I would do it:
If your list is truly large, then maybe you need to worry about the performance of this solution and use dirty tricks. But if your list is that large, it might be sort of crazy to have a human answering yes/no question about all the files that might show up.
Is performance an actual problem or just a sort of nagging worry? Because I'm pretty sure the above code is fast enough for practical use, and it is easier to understand and modify than tricky code.
EDIT: @jdi suggested, in the comments, using
itertools.ifilter()
orfilter()
I tested, and this should actually be faster than what I showed above:
I just copied @pepr's benchmarking code, and tested the solution using
filter()
as shown here. It was second fastest overall, with only Patrik2 being faster. Patrik2 was twice as fast, but again, any data set small enough that it is practical to have a human answering yes/no questions is probably small enough that a factor of two won't matter too much.EDIT: Just for fun, I went ahead and wrote a version that is a pure list comprehension. It has a single expression to evaluate, no Python function call.
Yuck! I much prefer making a function.
Actually, it is not that dirty solution. How long the list typically is? Even creating the new list should not be that much memory consumig as the list contains only references.
You can also loop in the
while
loop and enumerate for yourself, performingdel lst[n]
if the user decides (possibly counting separately the positions in the original).