I was answering this question, I preferred generator expression here and used this, which I thought would be faster as generator doesn't need to create the whole list first:
>>> lis=[['a','b','c'],['d','e','f']]
>>> 'd' in (y for x in lis for y in x)
True
And Levon used list comprehension in his solution,
>>> lis = [['a','b','c'],['d','e','f']]
>>> 'd' in [j for i in mylist for j in i]
True
But when I did the timeit results for these LC was faster than generator:
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f']]" "'d' in (y for x in lis for y in x)"
100000 loops, best of 3: 2.36 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f']]" "'d' in [y for x in lis for y in x]"
100000 loops, best of 3: 1.51 usec per loop
then I increased the size of list, and timed it again:
lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]
This time for searching 'd'
generator was faster than LC, but when I searched a middle element(11) and the last element then LC again beats the generator expression, and I can't understand why?
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in (y for x in lis for y in x)"
100000 loops, best of 3: 2.96 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "'d' in [y for x in lis for y in x]"
100000 loops, best of 3: 7.4 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "11 in [y for x in lis for y in x]"
100000 loops, best of 3: 5.61 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "11 in (y for x in lis for y in x)"
100000 loops, best of 3: 9.76 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "18 in (y for x in lis for y in x)"
100000 loops, best of 3: 8.94 usec per loop
~$ python -m timeit -s "lis=[['a','b','c'],['d','e','f'],[1,2,3],[4,5,6],[7,8,9],[10,11,12],[13,14,15],[16,17,18]]" "18 in [y for x in lis for y in x]"
100000 loops, best of 3: 7.13 usec per loop
Expanding on Paulo's answer, generator expressions are often slower than list comprehensions because of the overhead of function calls. In this case, the short-circuiting behavior of
in
offsets that slowness if the item is found fairly early, but otherwise, the pattern holds.I ran a simple script through the profiler for a more detailed analysis. Here's the script:
Here are the relevant results, reordered to make the patterns clearer.
Creating a generator expression is equivalent to creating a generator function and calling it. That accounts for one call to
<genexpr>
. Then, in the first case,next
is called 4 times, untild
is reached, for a total of 5 calls (times 100000 iterations = ncalls = 500000). In the second case, it is called 17 times, for a total of 18 calls; and in the third, 24 times, for a total of 25 calls.The genex outperforms the list comprehension in the first case, but the extra calls to
next
account for most of the difference between the speed of the list comprehension and the speed of the generator expression in the second and third cases.I'm not sure what accounts for the remaining time; it seems that generator expressions would be a hair slower even without the additional function calls. I suppose this confirms inspectorG4dget's assertion that "creating a generator comprehension has more native overhead than does a list comprehension." But in any case, this shows pretty clearly that generator expressions are slower mostly because of calls to
next
.I'll add that when short-circuiting doesn't help, list comprehensions are still faster, even for very large lists. For example:
As you can see, when short circuiting is irrelevant, list comprehensions are consistently faster even for a million-item-long list of lists. Obviously for actual uses of
in
at these scales, generators will be faster because of short-circuiting. But for other kinds of iterative tasks that are truly linear in the number of items, list comprehensions are pretty much always faster. This is especially true if you need to perform multiple tests on a list; you can iterate over an already-built list comprehension very quickly:In this case, the list comprehension is an order of magnitude faster!
Of course, this only remains true until you run out of memory. Which brings me to my final point. There are two main reasons to use a generator: to take advantage of short circuiting, and to save memory. For very large seqences/iterables, generators are the obvious way to go, because they save memory. But if short-circuiting is not an option, you pretty much never choose generators over lists for speed. You chose them to save memory, and it's always a trade-off.
Contrary to the popular belief, list comprehensions are pretty fine for moderate ranges. Iterator protocol implies calls for iterator.next(), and function calls in Python are expensive.
Of course at some point the generators memory/cpu trade-off will start to pay, but for small sets list comprehensions are very efficient.
Completely depends on the data.
Generators have a fixed setup time that must be amortized over how many items are called; List comprehensions are faster initially but will slow substantially as more memory is used with larger data sets.
Recall that as cPython lists are expanded, the list is resized in growth pattern of 4, 8, 16, 25, 35, 46, 58, 72, 88,.... For larger list comprehensions, Python may be allocating up to 4x more memory than the size of your data. Once you hit the VM --- really sloowww! But, as stated, list comprehensions are faster than generators for small data sets.
Consider case 1, a 2x26 list of lists:
Results in these timings:
Now consider case 2, that show wide disparity between a LC and gen. In this case, we are looking for one element in a 100 x 97 x 97 list of lists kind of structure:
Results in these times:
As you can see -- it depends and it is a tradeoff...