I'm struggeling to understand how the below code works. It's from http://docs.python.org/library/itertools.html#itertools.izip_longest, and is the pure-python equivalent of the izip_longest iterator. I'm especially mystified by the sentinel function, how does it work?
def izip_longest(*args, **kwds):
# izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
fillvalue = kwds.get('fillvalue')
def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
yield counter() # yields the fillvalue, or raises IndexError
fillers = repeat(fillvalue)
iters = [chain(it, sentinel(), fillers) for it in args]
try:
for tup in izip(*iters):
yield tup
except IndexError:
pass
Ok, we can do this. About the sentinel. The expression
([fillvalue]*(len(args)-1))
creates a list that contains one fill value for each iterable inargs
, minus one. So, for the example above['-']
.counter
is then assigned thepop
-function of that list.sentinel
itself is a generator that pops one item from that list on each iteration. You can iterate over each iterator returned bysentinel
exactly once, and it will always yieldfillvalue
. The total number of items yielded by all iterators returned bysentinel
islen(args) - 1
(thanks to Sven Marnach for clarifying that, I misunderstood it).Now check out this:
That is the trick.
iters
is a list that contains an iterator for each iterable inargs
. Each of these iterators does the following:args
.fillvalue
.fillvalue
for all eternity.Now, as established earlier, we can only iterate over all sentinels together
len(args)-1
times before it throws anIndexError
. This is fine, because one of the iterables is the longest. So, when we come to the point that theIndexError
is raised, that means we have finished iterating over the longest iterable inargs
.You are welcome.
P.S.: I hope this is understandable.
The definition of
sentinel
is almost equivalent toexcept that it gets the
pop
bound method (a function object) as a default argument. A default argument is evaluated at the time of function definition, so once per call toizip_longest
instead of once per call tosentinel
. Therefore, the function object "remembers" the list[fillvalue] * (len(args) - 1)
instead of constructing this anew in every call.The function
sentinel()
returns iterators yieldingfillvalue
exactly once. The total number offillvalue
s yielded by all iterators returned bysentinel()
is limited ton-1
, wheren
is the number of iterators passed toizip_longest()
. After this number offillvalue
s has been exhausted, further iteration over an iterator returned bysentinel()
will raise anIndexError
.This function is used to detect whether all iterators have been exhausted: Each iterator is
chain()
ed with an iterator returned bysentinel()
. If all iterators are exhausted, an iterator returned bysentinel()
will get iterated over for then
th time, resulting in anIndexError
, triggering the end ofizip_longest()
in turn.So far I explained what
sentinel()
does, not how it works. Whenizip_longest()
is called, the definition ofsentinel()
is evaluated. While evaluating the definition, also the default argument tosentinel()
is evaluated, once per call toizip_longest()
. The code is equivalent toStoring this in a default argument instead of in a variable in an enclosing scope is just an optimisation, as is the inclusion of
.pop
in the default argument, since it save looking it up every time an iterator returned bysentinel()
is iterated over.