With x = [1,2,3,4]
, I can get an iterator from i = iter(x)
.
With this iterator, I can use zip function to create a tuple with two items.
>>> i = iter(x)
>>> zip(i,i)
[(1, 2), (3, 4)]
Even I can use this syntax to get the same results.
>>> zip(*[i] * 2)
[(1, 2), (3, 4)]
How does this work? How an iterator with zip(i,i)
and zip(*[i] * 2)
work?
An iterator is like a stream of items. You can only look at the items in the stream one at a time and you only ever have access to the first element. To look at something in the stream, you need to remove it from the stream and once you take something from the top of the stream, it's gone from the stream for good.
When you call zip(i, i)
, zip
first looks at the first stream and takes an item out. Then it looks at the second stream (which happens to be the same stream as the first one) and takes an item out. Then it makes a tuple out of those two items and repeats this over and over until there is nothing left in the stream.
Maybe it's easier to see if I were to write the zip
function in pure python (with only 2 arguments for simplicity). It would look something like1:
def zip(a, b):
out = []
try:
while True:
item1 = next(a)
item2 = next(b)
out.append((item1, item2))
except StopIteration:
return out
Now imagine the case that you are talking about where a
and b
are the same object. In that case, we just call next
twice on the iterator (i
in your example case) which will just take the first two items from i
in sequence and pack them into a tuple.
Once we've understood why zip(i, i)
behaves the way it does, zip(*([i] * 2))
isn't too hard. Lets read the expression from the inside out...
[i] * 2
That just creates a new list (of length 2) where both of the elements are references to the iterator i
. So it's the same thing as zip(*[i, i])
(it's just more convenient to write when you want to repeat something many more than 2 times). *
unpacking is a common idiom in python and you can find more information in the python tutorial. The gist of it is that python takes the iterable and "unpacks" it as if each item of the iterable was a separate positional argument to the function. So:
zip(*[i, i])
does the same thing as:
zip(i, i)
And now Bob's our uncle. We've just come full-circle since zip(i, i)
is where this discussion started.
1This example code is definitely simplified more than just the afore-mentioned only accepting 2 arguments. For example, zip
is probably going to call iter
on the input arguments so that it works for any iterable (not just iterators), but this should be enough to get the point across...
Every time you get an item from an iterator, it stays at that spot rather than "rewinding." So zip(i, i)
gets the first item from i
, then the second item from i
, and returns that as a tuple
. It continues to do this for each available pair, until the iterator is exhausted.
zip(*[i]*2)
creates a list
of [i, i]
by multiplying i
by 2
, then unpacks it with the *
at the far left, which, in effect, sends two arguments i
and i
to zip
, producing the same result as the first snippet.