Is there an efficient way to know how many elements are in an iterator in Python, in general, without iterating through each and counting?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
No. It's not possible.
Example:
Length of
iterator
is unknown until you iterate through it.I like the cardinality package for this, it is very lightweight and tries to use the fastest possible implementation available depending on the iterable.
Usage:
The actual
count()
implementation is as follows:No, any method will require you to resolve every result. You can do
but running that on an infinite iterator will of course never return. It also will consume the iterator and it will need to be reset if you want to use the contents.
Telling us what real problem you're trying to solve might help us find you a better way to accomplish your actual goal.
Edit: Using
list()
will read the whole iterable into memory at once, which may be undesirable. Another way is to doas another person posted. That will avoid keeping it in memory.
Kinda. You could check the
__length_hint__
method, but be warned that (at least up to Python 3.4, as gsnedders helpfully points out) it's a undocumented implementation detail (following message in thread), that could very well vanish or summon nasal demons instead.Otherwise, no. Iterators are just an object that only expose the
next()
method. You can call it as many times as required and they may or may not eventually raiseStopIteration
. Luckily, this behaviour is most of the time transparent to the coder. :)A quick benchmark:
The results:
I.e. the simple count_iter_items is the way to go.
So, for those who would like to know the summary of that discussion. The final top scores for counting a 50 million-lengthed generator expression using:
len(list(gen))
,len([_ for _ in gen])
,sum(1 for _ in gen),
ilen(gen)
(from more_itertool),reduce(lambda c, i: c + 1, gen, 0)
,sorted by performance of execution (including memory consumption), will make you surprised:
```
1: test_list.py:8: 0.492 KiB
('list, sec', 1.9684218849870376)
2: test_list_compr.py:8: 0.867 KiB
('list_compr, sec', 2.5885991149989422)
3: test_sum.py:8: 0.859 KiB
('sum, sec', 3.441088170016883)
4: more_itertools/more.py:413: 1.266 KiB
('ilen, sec', 9.812256851990242)
5: test_reduce.py:8: 0.859 KiB
('reduce, sec', 13.436614598002052) ```
So,
len(list(gen))
is the most frequent and less memory consumable