I am curious what the fastest way to consume an iterator would be, and the most Pythonic way.
For example, say that I want to create an iterator with the map
builtin that accumulates something as a side-effect. I don't actually care about the result of the map
, just the side effect, so I want to blow through the iteration with as little overhead or boilerplate as possible. Something like:
my_set = set()
my_map = map(lambda x, y: my_set.add((x, y)), my_x, my_y)
In this example, I just want to blow through the iterator to accumulate things in my_set
, and my_set
is just an empty set until I actually run through my_map
. Something like:
for _ in my_map:
pass
or a naked
[_ for _ in my_map]
works, but they both feel clunky. Is there a more Pythonic way to make sure an iterator iterates quickly so that you can benefit from some side-effect?
Benchmark
I tested the two methods above on the following:
my_x = np.random.randint(100, size=int(1e6))
my_y = np.random.randint(100, size=int(1e6))
with my_set
and my_map
as defined above. I got the following results with timeit:
for _ in my_map:
pass
468 ms ± 20.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[_ for _ in my_map]
476 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
No real difference between the two, and they both feel clunky.
Note, I got similar performance with list(my_map)
, which was a suggestion in the comments.
While you shouldn't be creating a map object just for side effects, there is in fact a standard recipe for consuming iterators in the itertools
docs:
def consume(iterator, n=None):
"Advance the iterator n-steps ahead. If n is None, consume entirely."
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(islice(iterator, n, n), None)
For just the "consume entirely" case, this can be simplified to
def consume(iterator):
collections.deque(iterator, maxlen=0)
Using collections.deque
this way avoids storing all the elements (because maxlen=0
) and iterates at C speed, without bytecode interpretation overhead. There's even a dedicated fast path in the deque implementation for using a maxlen=0
deque to consume an iterator.
Timing:
In [1]: import collections
In [2]: x = range(1000)
In [3]: %%timeit
...: i = iter(x)
...: for _ in i:
...: pass
...:
16.5 µs ± 829 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [4]: %%timeit
...: i = iter(x)
...: collections.deque(i, maxlen=0)
...:
12 µs ± 566 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
Of course, this is all based on CPython. The entire nature of interpreter overhead is very different on other Python implementations, and the maxlen=0
fast path is specific to CPython. See abarnert's answer for other Python implementations.
If you only care about CPython, deque
is the fastest way, as demonstrated in user2357112's answer.1 And the same thing has been demonstrated in 2.7 and 3.2, and 32- vs. 64-bit, and Windows vs. Linux, and so on.
But that relies on an optimization in CPython's C implementation of deque
. Other implementations may have no such optimization, which means they end up calling an append
for each element.
In PyPy in particular, there is no such optimization in the source,2 and the JIT cannot optimize that no-op append
out. (And it's hard to see how it couldn't require at least a guard test each time through the loop.) Of course compared to the cost of looping in Python… right? But looping in Python is blazing fast in PyPy, almost as fast as a C loop in CPython, so this actually makes a huge difference.
Comparing the times (using identical tests as in user's answer:3
for deque
CPython 19.7us 12.7us
PyPy 1.37us 23.3us
There's no 3.x versions of the other major interpreters, and I don't have IPython for any of them, but a quick test with Jython shows similar effects.
So, the fastest portable implementation is something like:
if sys.implementation.name == 'cpython':
import collections
def consume(it):
return collections.deque(it, maxlen=0)
else:
def consume(it):
for _ in it:
pass
This of course gives me 12.7us in CPython, and 1.41us in PyPy.
1. Of course you could write a custom C extension, but it's only going to be faster by a tiny constant term—you can avoid the constructor call and the test before jumping to the fast path, but once you get into that loop, you have to do exactly what it's doing.
2. Tracing through PyPy source is always fun… but I think it ends up in the W_Deque
class that's, which is part of the builtin _collections
module.
3. CPython 3.6.4; PyPy 5.10.1/3.5.3; both from the respective standard 64-bit macOS installers.