How would I turn a generator of pairs (tuples):
tuple_gen = (i for i in [(1, "a"), (2, "b"), (3, "c")])
Into two generators which would yield [1, 2, 3]
and ["a", "b", "c"]
?
I need to process separately the first and second elements of the tuples and the processing functions expect an iterable.
The generator is very large (millions of items) so I'd like to avoid having all items in memory at the same time unless there is no other solution.
There's a fundamental problem here. Say you get your two iterators
iter1
anditer2
, and you passiter1
to a function that eats the whole thing:That's going to need to iterate through all of
tuple_gen
to get the first items, and then what do you do with the second items? Unless you're okay with rerunning the generator to get the second items again, you need to store all of them, in memory unless you can persist them to disk or something, so you're not much better off than if you'd just dumpedtuple_gen
into a list.If you do this, you have to consume the iterators in parallel, or run the underlying generator twice, or spend a lot of memory saving the tuple elements you're not processing so the other iterator can go over them. Unfortunately, consuming the iterators in parallel will require either rewriting the consumer functions or running them in separate threads. Running the generator twice is simplest if you can do it, but not always an option.
You can use
itertools
for operating as follows:Note:
For python3
izip
would be justzip
,imap
justmap
Case 1
I don't know where it comes from
[(1, "a"), (2, "b"), (3, "c")]
But if it comes from like below codeYou can use
gen1
andgen2
directly.Case 2
If you’ve already created the list
[(1, "a"), (2, "b"), (3, "c")]
and just don’t want to create the list twice. You can do like below.Case 3
otherwise, just create one list, but it cosumes CPU resource to expand generator. This is what you don’t want to.
I think there is no way to reset generator, iterator to first position.
You can create
n
distinct iterators using the tee function from the itertools package. You would then iterate over them separately: