I am searching for a middle ground between Python's zip
and zip_longest
functions (from the itertools module), that exhausts all given iterators, but does not fill in anything. So, for example, it should transpose tuples like so:
(11, 12, 13 ), (11, 21, 31, 41),
(21, 22, 23, 24), --> (12, 22, 32, 42),
(31, 32 ), (13, 23, 43),
(41, 42, 43, 44), ( 24, 44)
(Spaces added for nicer graphical alignment.)
I managed to compose a crude a solution by cleaning out the fillvalue
s after zip_longest
.
def zip_discard(*iterables, sentinel = object()):
return map(
partial(filter, partial(is_not, sentinel)),
zip_longest(*iterables, fillvalue=sentinel))
Is there a way to do this without introducing the sentinels to begin with? Can this be improved using yield
? Which approach seems most efficient?
Both zip
and zip_longest
were designed to always generate tuples of equal length, you can define your own generator that doesn't care about the len with something like this:
def _one_pass(iters):
for it in iters:
try:
yield next(it)
except StopIteration:
pass #of some of them are already exhausted then ignore it.
def zip_varlen(*iterables):
iters = [iter(it) for it in iterables]
while True: #broken when an empty tuple is given by _one_pass
val = tuple(_one_pass(iters))
if val:
yield val
else:
break
If the data being zipped together is fairly large then skipping the exhausted iterators every time can be expensive, it may be more efficient to remove the finished iterators from iters
in the _one_pass
function like this:
def _one_pass(iters):
i = 0
while i<len(iters):
try:
yield next(iters[i])
except StopIteration:
del iters[i]
else:
i+=1
both of these versions would remove the need to create intermediate results or use temporary filler values.
You approach is good. I think using a sentinel is elegant. I might be considered a bit more pythonic to use a nested generator expression:
def zip_discard_gen(*iterables, sentinel=object()):
return ((entry for entry in iterable if entry is not sentinel)
for iterable in zip_longest(*iterables, fillvalue=sentinel))
This needs fewer imports because there is no need for partial()
or ne()
.
It is a bit faster too:
data = [(11, 12, 13 ),
(21, 22, 23, 24),
(31, 32 ),
(41, 42, 43, 44)]
%timeit [list(x) for x in zip_discard(*data)]
10000 loops, best of 3: 17.5 µs per loop
%timeit [list(x) for x in zip_discard_gen(*data)]
100000 loops, best of 3: 14.2 µs per loop
EDIT
A list comprehension version is a bit faster:
def zip_discard_compr(*iterables, sentinel=object()):
return [[entry for entry in iterable if entry is not sentinel]
for iterable in zip_longest(*iterables, fillvalue=sentinel)]
Timing:
%timeit zip_discard_compr(*data)
100000 loops, best of 3: 6.73 µs per loop
A Python 2 version:
from itertools import izip_longest
SENTINEL = object()
def zip_discard_compr(*iterables):
sentinel = SENTINEL
return [[entry for entry in iterable if entry is not sentinel]
for iterable in izip_longest(*iterables, fillvalue=sentinel)]
Timings
This version returns the same data structure as zip_varlen
by
Tadhg McDonald-Jensen:
def zip_discard_gen(*iterables, sentinel=object()):
return (tuple([entry for entry in iterable if entry is not sentinel])
for iterable in zip_longest(*iterables, fillvalue=sentinel))
It is about twice as fast:
%timeit list(zip_discard_gen(*data))
100000 loops, best of 3: 9.37 µs per loop
%timeit list(zip_varlen(*data))
10000 loops, best of 3: 18 µs per loop