Turning a generator of pairs into a pair of genera

2019-04-07 17:27发布

How would I turn a generator of pairs (tuples):

tuple_gen = (i for i in [(1, "a"), (2, "b"), (3, "c")])

Into two generators which would yield [1, 2, 3] and ["a", "b", "c"]?

I need to process separately the first and second elements of the tuples and the processing functions expect an iterable.

The generator is very large (millions of items) so I'd like to avoid having all items in memory at the same time unless there is no other solution.

4条回答
叛逆
2楼-- · 2019-04-07 18:09

There's a fundamental problem here. Say you get your two iterators iter1 and iter2, and you pass iter1 to a function that eats the whole thing:

def consume(iterable):
    for thing in iterable:
        do_stuff_with(thing)

consume(iter1)

That's going to need to iterate through all of tuple_gen to get the first items, and then what do you do with the second items? Unless you're okay with rerunning the generator to get the second items again, you need to store all of them, in memory unless you can persist them to disk or something, so you're not much better off than if you'd just dumped tuple_gen into a list.


If you do this, you have to consume the iterators in parallel, or run the underlying generator twice, or spend a lot of memory saving the tuple elements you're not processing so the other iterator can go over them. Unfortunately, consuming the iterators in parallel will require either rewriting the consumer functions or running them in separate threads. Running the generator twice is simplest if you can do it, but not always an option.

查看更多
爷、活的狠高调
3楼-- · 2019-04-07 18:09

You can use itertools for operating as follows:

>>>from itertools import chain, izip, imap
>>>tuple_gen = (i for i in [(1, "a"), (2, "b"), (3, "c")])
>>>nums_gen, letters_gen = imap(lambda x: chain(x), izip(*tuple_gen))
>>>list(nums_gen)
[1, 2, 3]
>>>list(letters_gen)
['a', 'b', 'c']

Note:

For python3 izip would be just zip, imap just map

查看更多
看我几分像从前
4楼-- · 2019-04-07 18:12

Case 1

I don't know where it comes from [(1, "a"), (2, "b"), (3, "c")] But if it comes from like below code

gen1 = (i for i in  [1,2,3])
gen2 = (i for i in ["a", "b", "c"])
tuple_gen = (i for i in zip(gen1, gen2))

You can use gen1 and gen2 directly.

Case 2

If you’ve already created the list [(1, "a"), (2, "b"), (3, "c")] and just don’t want to create the list twice. You can do like below.

lst = [(1, "a"), (2, "b"), (3, "c")]
gen1 = (i[0] for i in lst)
gen2 = (i[1] for i in lst)

Case 3

otherwise, just create one list, but it cosumes CPU resource to expand generator. This is what you don’t want to.

tuple_gen = (i for i in [(1, "a"), (2, "b"), (3, "c")])
tmp = list(tuple_gen)
gen1 = iter(tmp)
gen2 = iter(tmp)

I think there is no way to reset generator, iterator to first position.

查看更多
爱情/是我丢掉的垃圾
5楼-- · 2019-04-07 18:19

You can create n distinct iterators using the tee function from the itertools package. You would then iterate over them separately:

from itertools impor tee

i1, i2 = tee(tuple_gen, n=2)
firsts = (x[0] for x in i1)
seconds = (x[1] for x in i2)
查看更多
登录 后发表回答