I'm struggeling to understand how the below code works. It's from http://docs.python.org/library/itertools.html#itertools.izip_longest, and is the pure-python equivalent of the izip_longest iterator. I'm especially mystified by the sentinel function, how does it work?
def izip_longest(*args, **kwds):
# izip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-
fillvalue = kwds.get('fillvalue')
def sentinel(counter = ([fillvalue]*(len(args)-1)).pop):
yield counter() # yields the fillvalue, or raises IndexError
fillers = repeat(fillvalue)
iters = [chain(it, sentinel(), fillers) for it in args]
try:
for tup in izip(*iters):
yield tup
except IndexError:
pass
Ok, we can do this. About the sentinel. The expression ([fillvalue]*(len(args)-1))
creates a list that contains one fill value for each iterable in args
, minus one. So, for the example above ['-']
. counter
is then assigned the pop
-function of that list. sentinel
itself is a generator that pops one item from that list on each iteration. You can iterate over each iterator returned by sentinel
exactly once, and it will always yield fillvalue
. The total number of items yielded by all iterators returned by sentinel
is len(args) - 1
(thanks to Sven Marnach for clarifying that, I misunderstood it).
Now check out this:
iters = [chain(it, sentinel(), fillers) for it in args]
That is the trick. iters
is a list that contains an iterator for each iterable in args
. Each of these iterators does the following:
- Iterate over all items in the corresponding iterable from
args
.
- Iterate over sentinel once, yielding
fillvalue
.
- Repeat
fillvalue
for all eternity.
Now, as established earlier, we can only iterate over all sentinels together len(args)-1
times before it throws an IndexError
. This is fine, because one of the iterables is the longest. So, when we come to the point that the IndexError
is raised, that means we have finished iterating over the longest iterable in args
.
You are welcome.
P.S.: I hope this is understandable.
The function sentinel()
returns iterators yielding fillvalue
exactly once. The total number of fillvalue
s yielded by all iterators returned by sentinel()
is limited to n-1
, where n
is the number of iterators passed to izip_longest()
. After this number of fillvalue
s has been exhausted, further iteration over an iterator returned by sentinel()
will raise an IndexError
.
This function is used to detect whether all iterators have been exhausted: Each iterator is chain()
ed with an iterator returned by sentinel()
. If all iterators are exhausted, an iterator returned by sentinel()
will get iterated over for the n
th time, resulting in an IndexError
, triggering the end of izip_longest()
in turn.
So far I explained what sentinel()
does, not how it works. When izip_longest()
is called, the definition of sentinel()
is evaluated. While evaluating the definition, also the default argument to sentinel()
is evaluated, once per call to izip_longest()
. The code is equivalent to
fillvalue_list = [fillvalue] * (len(args)-1)
def sentinel():
yield fillvalue_list.pop()
Storing this in a default argument instead of in a variable in an enclosing scope is just an optimisation, as is the inclusion of .pop
in the default argument, since it save looking it up every time an iterator returned by sentinel()
is iterated over.
The definition of sentinel
is almost equivalent to
def sentinel():
yield ([fillvalue] * (len(args) - 1)).pop()
except that it gets the pop
bound method (a function object) as a default argument. A default argument is evaluated at the time of function definition, so once per call to izip_longest
instead of once per call to sentinel
. Therefore, the function object "remembers" the list [fillvalue] * (len(args) - 1)
instead of constructing this anew in every call.