I have a class with both an __iter__
and a __len__
methods. The latter uses the former to count all elements.
It works like the following:
class A:
def __iter__(self):
print("iter")
for _ in range(5):
yield "something"
def __len__(self):
print("len")
n = 0
for _ in self:
n += 1
return n
Now if we take e.g. the length of an instance it prints len
and iter
, as expected:
>>> len(A())
len
iter
5
But if we call list()
it calls both __iter__
and __len__
:
>>> list(A())
len
iter
iter
['something', 'something', 'something', 'something', 'something']
It works as expected if we make a generator expression:
>>> list(x for x in A())
iter
['something', 'something', 'something', 'something', 'something']
I would assume list(A())
and list(x for x in A())
to work the same but they don’t.
Note that it appears to first call __iter__
, then __len__
, then loop over the iterator:
class B:
def __iter__(self):
print("iter")
def gen():
print("gen")
yield "something"
return gen()
def __len__(self):
print("len")
return 1
print(list(B()))
Output:
iter
len
gen
['something']
How can I get list()
not to call __len__
so that my instance’s iterator is not consumed twice? I could define e.g. a length
or size
method and one would then call A().size()
but that’s less pythonic.
I tried to compute the length in __iter__
and cache it so that subsequent calls to __len__
don’t need to iter again but list()
calls __len__
without starting to iterate so it doesn’t work.
Note that in my case I work on very large data collections so caching all items is not an option.
It's a safe bet that the
list()
constructor is detecting thatlen()
is available and calling it in order to pre-allocate storage for the list.Your implementation is pretty much completely backwards. You are implementing
__len__()
by using__iter__()
, which is not what Python expects. The expectation is thatlen()
is a fast, efficient way to determine the length in advance.I don't think you can convince
list(A())
not to calllen
. As you have already observed, you can create an intermediate step that preventslen
from being called.You should definitely cache the result, if the sequence is immutable. If there are as many items as you speculate, there's no sense computing
len
more than once.You don't have to implement
__len__
. For an class that is iterable, it just needs to implement either of below:__iter__
, which returns an iterator, or a generator as in your class A & B__getitems__
, as long as it raisesIndexError
when the index is out of rangeBlow code still works:
Which outputs: