Single list iteration vs multiple list comprehensi

2019-07-30 00:30发布

问题:

I have a list of data for which I need to copy some of it's elements into a couple of different lists. Would it be better to do a single iteration of the list or perform multiple list comprehensions

E.g.

def split_data(data):
    a = []
    b = []
    c = []
    for d in data:
        if d[0]   >   1 : a.append(d)
        if d[1]   == 'b': b.append(d)
        if len(d) ==  3 : c.append(d)

    return a, b, c

Versus

def split_data(data):
    a = [d for d in data if d[0]   >   1 ]
    b = [d for d in data if d[1]   == 'b']
    c = [d for d in data if len(d) ==  3 ]

    return a, b, c

I know the more pythonic way of doing this is with list comprehensions, but is that the case in this instance?

回答1:

in your 1st example code, it only need to iterate through the data once with multiple if statement, while the later code need to iterate through the data 3 times. I believe list comprehension will win most of the time with equal number of iteration over data.

For simple operation like your example, i would prefer list comprehension method, when the operation become more complex, maybe the other would be better for the sake of code readability.

Some benchmarking over the 2 function should tell you more. Based on my quick benchmarking over those 2 function using some dummy data set getting runtime as below. This runtime might not always true depends on the data set

# without list comprehension
>>> timeit.timeit('__main__.split_data([("a","b")] * 1000000)', 'import __main__', number=1)
0.43826036048574224

# with list comprehension
>>> timeit.timeit('__main__.split_data([("a","b")] * 1000000)', 'import __main__', number=1)
0.31136326966964134


回答2:

I'd say it depends. If your d is a list and comparatively small, you could go for list comprehension. However, if your d is comparatively large (hint %timeit is your friend), your first option will only iterate once over it and might therefore be more efficient.

Also note, that your first version would work with all generators, whereas the second version won't work with generators that consume items. You could even chain this by providing a generator yourself, i.e., using yield a, b, c instead of return.



回答3:

If you wanna go with more pythonic, we can consult the zen of python.

Explicit is better than implicit.

Sparse is better than dense.

Readability counts.

Although both are readable, I'd say your first example is more readable. If your data had more dimensions and required more nested for loops, the first example would be more clear about how you want to handle each nested element if more logic was involved.

Although Skycc's answer does show slightly faster results for list comprehension, ideally you should go for readability first then optimize, unless you really need that little speedup from list comprehension.