Best method for changing a list while iterating ov

2020-06-03 02:50发布

I have several instances in a python script (v2.6) where I need to modify a list in-place. I need to pop values from the list in response to interactive input from the user and would like to know the cleanest method of doing this. Currently I have the very dirty solutions of a) setting items in the list that I want to remove to False and removing them with a filter or list comprehension or b) generating an entirely new list while going through the loop, which seems to be needlessly adding variables to the namespace and taking up memory.

An example of this problem is as follows:

for i, folder in enumerate(to_run_folders):
    if get_size(folder) < byte_threshold:
        ans = raw_input(('The folder {0}/ is less than {1}MB.' + \
                    ' Would you like to exclude it from' + \
                    ' compression? ').format(folder, megabyte_threshold))
        if 'y' in ans.strip().lower():
            to_run_folders.pop(i)

I would like to look at each folder in the list. If the current folder is less than a certain size, I want to ask the user if they want to exclude it. If they do, pop the folder from the list.

The problem with this routine is that if I iterate over the list, I get unexpected behavior and early termination. If I iterate over a copy by slicing, pop doesn't pull the right value because the indices are shifted and the problem compounds as more items are popped. I have a need for dynamic list adjustment of this kind in other areas of my script as well. Is there any clean method for this kind of functionality?

5条回答
手持菜刀,她持情操
2楼-- · 2020-06-03 03:25

You can loop over the list backwards, or use a view object.

See https://stackoverflow.com/a/181062/711085 for how to loop over a list backwards. Basically use reversed(yourList) (which happens creates a view object which visits backwards).

If you require indexing, you could do reversed(enumerate(yourList)), but that would effectively create a temporary list in memory because enumerate would need to run before reversed could kick in. You will need to either do index manipulation, or to do this:

for i in xrange(len(yourList)-1, -1, -1):
    item = yourList[i]
    ...

Even cleaner: reversed is aware of range, so you can do this in python3, or in python2 if you use xrange instead:

for i in reversed(range(len(yourList))):  
    item = yourList[i]
    ...

(proof: you can do next(reversed(range(10**10))), but this will crash your computer if using python2)

查看更多
Evening l夕情丶
3楼-- · 2020-06-03 03:28

OK, I have measured the solution. The reversed solutions are about the same. The forward while loop is about 4 times slower. BUT! The Patrik's dirty solution is about 80 times faster for the list of 100,000 random integers [bug in Patrik2 corrected]:

import timeit
import random

def solution_ninjagecko1(lst):
    for i in xrange(len(lst)-1, -1, -1):
        if lst[i] % 2 != 0:    # simulation of the choice
            del lst[i]
    return lst

def solution_jdi(lst):
    L = len(lst) - 1
    for i, v in enumerate(reversed(lst)):
        if v % 2 != 0:
            lst.pop(L-i)  # L-1 is the forward index
    return lst

def solution_Patrik(lst):
    for i, v in enumerate(lst):
        if v % 2 != 0:         # simulation of the choice
            lst[i] = None
    return [v for v in lst if v is not None]

def solution_Patrik2(lst):
    ##buggy lst = [v for v in lst if v % 2 != 0]
    ##buggy return [v for v in lst if v is not None]
    # ... corrected to
    return [v for v in lst if v % 2 != 0]

def solution_pepr(lst):
    i = 0                      # indexing the processed item
    n = 0                      # enumerating the original position
    while i < len(lst):
        if lst[i] % 2 != 0:    # simulation of the choice
            del lst[i]         # i unchanged if item deleted
        else:
            i += 1             # i moved to the next
        n += 1
    return lst

def solution_pepr_reversed(lst):
    i = len(lst) - 1           # indexing the processed item backwards
    while i > 0:
        if lst[i] % 2 != 0:    # simulation of the choice
            del lst[i]         # i unchanged if item deleted
        i -= 1                 # i moved to the previous
    return lst

def solution_steveha(lst):
    def should_keep(x):
        return x % 2 == 0
    return filter(should_keep, lst)

orig_lst = range(30)
print 'range() generated list of the length', len(orig_lst)
print orig_lst[:20] + ['...']   # to have some fun :)

lst = orig_lst[:]  # copy of the list
print solution_ninjagecko1(lst)

lst = orig_lst[:]  # copy of the list
print solution_jdi(lst)

lst = orig_lst[:]  # copy of the list
print solution_Patrik(lst)

lst = orig_lst[:]  # copy of the list
print solution_pepr(lst)

orig_lst = [random.randint(1, 1000000) for n in xrange(100000)]
print '\nrandom list of the length', len(orig_lst)
print orig_lst[:20] + ['...']   # to have some fun :)

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_ninjagecko1(lst)',
                  'from __main__ import solution_ninjagecko1, lst',
                  number=1)
print 'solution_ninjagecko1: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_jdi(lst)',
                  'from __main__ import solution_jdi, lst',
                  number=1)
print 'solution_jdi: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_Patrik(lst)',
                  'from __main__ import solution_Patrik, lst',
                  number=1)
print 'solution_Patrik: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_Patrik2(lst)',
                  'from __main__ import solution_Patrik2, lst',
                  number=1)
print 'solution_Patrik2: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_pepr_reversed(lst)',
                  'from __main__ import solution_pepr_reversed, lst',
                  number=1)
print 'solution_pepr_reversed: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_pepr(lst)',
                  'from __main__ import solution_pepr, lst',
                  number=1)
print 'solution_pepr: ', t

lst = orig_lst[:]  # copy of the list
t = timeit.timeit('solution_steveha(lst)',
                  'from __main__ import solution_steveha, lst',
                  number=1)
print 'solution_steveha: ', t

It prints on my console:

c:\tmp\_Python\Patrick\so10305762>python a.py
range() generated list of the length 30
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, '...']
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28]

random list of the length 100000
[915411, 954538, 794388, 847204, 846603, 454132, 866165, 640004, 930488, 609138,
 333405, 986073, 318301, 728151, 996047, 117633, 455353, 581737, 55350, 485030,
'...']
solution_ninjagecko1:  2.41921752625
solution_jdi:  2.45477176569
solution_Patrik:  0.0468565138865
solution_Patrik2:  0.024270403082
solution_pepr_reversed:  2.43338888043
solution_pepr:  9.11879694207

So, I tried with longer list. Using only twice as long one makes a big difference (on my old computer). Patrik's dirty solution behaves very nicely. It is about 200 times faster than the reversed solutions:

random list of the length 200000
[384592, 170167, 598270, 832363, 123557, 81804, 319315, 445945, 178732, 726600,
516835, 392267, 552608, 40807, 349215, 208111, 880032, 520614, 384119, 350090, 
'...']
solution_ninjagecko1:  17.362140719
solution_jdi:  17.86837545
solution_Patrik:  0.0957998851809
solution_Patrik2:  0.0500024444448
solution_pepr_reversed:  17.5078452708
solution_pepr:  52.175648581

[Added after ninjagecko's comments]

The corrected Patrik2 solution is about twice faster than 2-stage Patrick solution.

To simulate not so frequent deletion of the elements, the test like if v % 2 != 0: was changed to if v % 100 == 0:. Then about 1 % of items should be deleted. It is obvious that it takes less time. For 500,000 random integers in the list, the results are the following:

random list of the length 500000
[403512, 138135, 552313, 427971, 42358, 500926, 686944, 304889, 916659, 112636,
791585, 461948, 82622, 522768, 485408, 774048, 447505, 830220, 791421, 580706, 
'...']
solution_ninjagecko1:  6.79284210703
solution_jdi:  6.84066913532
solution_Patrik:  0.241242951269
solution_Patrik2:  0.162481823807
solution_pepr_reversed:  6.92106007886
solution_pepr:  7.12900522273

Patrick's solution is stil about 30 times faster.

[Added 2012/04/25]

Another solution that works in-place, that loops forward, and that is as fast as Patrick's solution. It does not move all the tail when the element is deleted. Instead, it moves the wanted elements to their final position and then cuts the unused tail of the list.

def solution_pepr2(lst):
    i = 0
    for v in lst:
        lst[i] = v              # moving the element (sometimes unneccessary)
        if v % 100 != 0:        # simulation of the choice
            i += 1              # here will be the next one
    lst[i:] = []                # cutting the tail of the length of the skipped
    return lst

# The following one only adds the enumerate to simulate the situation when
# it is needed -- i.e. slightly slower but with the same complexity.        
def solution_pepr2enum(lst):
    i = 0
    for n, v in enumerate(lst):
        lst[i] = v              # moving the element (sometimes unneccessary)
        if v % 100 != 0:        # simulation of the choice
            i += 1              # here will be the next one
    lst[i:] = []                # cutting the tail of the length of the skipped
    return lst

Compared with the above solutions for v % 100 != 0:

random list of the length 500000
[533094, 600755, 58260, 295962, 347612, 851487, 523927, 665648, 537403, 238660,
781030, 940052, 878919, 565870, 717745, 408465, 410781, 560173, 51010, 730322, 
'...']
solution_ninjagecko1:  1.38956896051
solution_jdi:  1.42314502685
solution_Patrik:  0.135545530079
solution_Patrik2:  0.0926935780151
solution_pepr_reversed:  1.43573239178
solution_steveha:  0.122824246805
solution_pepr2:  0.0938177241656
solution_pepr2enum:  0.11096263294
查看更多
够拽才男人
4楼-- · 2020-06-03 03:38

You can loop over it backwards

Backwards:

x = range(10)
l = len(x)-1  # max index

for i, v in enumerate(reversed(x)):
    if v % 2:
        x.pop(l-i)  # l-1 is the forward index
查看更多
萌系小妹纸
5楼-- · 2020-06-03 03:40

The best way to handle this, the most "Pythonic" way, is in fact to loop over your list and make a new list that contains just the folders you want. Here is how I would do it:

def want_folder(fname):
    if get_size(folder) >= byte_threshold:
        return True
    ans = raw_input(('The folder {0}/ is less than {1}MB.' + \
                ' Would you like to exclude it from' + \
                ' compression? ').format(folder, megabyte_threshold))
    return 'y' not in ans.strip().lower()

to_run_folders = [fname for fname in to_run_folders if want_folder(fname)]

If your list is truly large, then maybe you need to worry about the performance of this solution and use dirty tricks. But if your list is that large, it might be sort of crazy to have a human answering yes/no question about all the files that might show up.

Is performance an actual problem or just a sort of nagging worry? Because I'm pretty sure the above code is fast enough for practical use, and it is easier to understand and modify than tricky code.

EDIT: @jdi suggested, in the comments, using itertools.ifilter() or filter()

I tested, and this should actually be faster than what I showed above:

to_run_folders = filter(want_folder, to_run_folders)

I just copied @pepr's benchmarking code, and tested the solution using filter() as shown here. It was second fastest overall, with only Patrik2 being faster. Patrik2 was twice as fast, but again, any data set small enough that it is practical to have a human answering yes/no questions is probably small enough that a factor of two won't matter too much.

EDIT: Just for fun, I went ahead and wrote a version that is a pure list comprehension. It has a single expression to evaluate, no Python function call.

to_run_folders = [fname for fname in to_run_folders
        if get_size(fname) >= mb_threshold or
                'y' not in raw_input(('The folder {0}/ is less than {1}MB.' +
                ' Would you like to exclude it from compression? '
                ).format(fname, mb_threshold)).strip().lower()

]

Yuck! I much prefer making a function.

查看更多
孤傲高冷的网名
6楼-- · 2020-06-03 03:46

Currently I have the very dirty solutions of a) setting items in the list that I want to remove to False and removing them with a filter or list comprehension or b) generating an entirely new list while going through the loop, which seems to be needlessly adding variables to the namespace and taking up memory.

Actually, it is not that dirty solution. How long the list typically is? Even creating the new list should not be that much memory consumig as the list contains only references.

You can also loop in the while loop and enumerate for yourself, performing del lst[n] if the user decides (possibly counting separately the positions in the original).

查看更多
登录 后发表回答