Consuming two iterators in parallel

Suppose I have two iterators, and I want to compute

fancyoperation1(iter1), fancyoperation2(iter2)

Normally, I would simply use fancyoperation1(iter1), fancyoperation2(iter2). However, if these iterators are linked to a single source, perhaps teed from a single iterator, I can't do this without keeping a lot of temporary data in memory. In that case, I know of several options:

I could rewrite fancyoperation1 and fancyoperation2 into a single function that does both at the same time, but that may be a lot of code duplication, and I may not understand or have the source code for either function. Also, this would need to be done anew for every pair of operations.
I could use threading. The synchronization can probably be written once in a helper function, and the overhead probably wouldn't be too bad as long as I don't need to switch threads too often.
I could keep a lot of temporary data in memory.

I don't really like the drawbacks of those options, though. Is there a way to do what I want in one thread, without rewriting things or using large amounts of memory? I tried to do it with coroutines, but Python's yield doesn't seem to be powerful enough.

(I do not currently have this problem, but I'm wondering what to do if it ever comes up.)

标签： python parallel-processing

1条回答

我命由我不由天

2楼-- · 2020-04-11 14:42

You absolutely can use coroutines for this, it's just slightly less convenient (but on the bright side, you can keep them separated and can leave most code unaltered). Change the fancy operations to be parameterless and repeatedly use yield (as expression) to fetch data instead of accepting a parameter and iterating over it. In other words, change this:

def fancyoperation1(it):
    for x in it:
        ...
    cleanup()

# into something like this

def fancyoperation1():
    while True:
        try:
            x = yield
        except GeneratorExit:
            break
        ...
    cleanup()

Of course, it's easier if there is no post-iteration clean up to be done. You can use these like this (assuming iter1, iter2 = tee(underlying_iter)):

f1, f2 = fancyoperation1(), fancyoperation2()
f1.send(None) # start coroutines
f2.send(None)

for x in underlying_iterator:
    f1.send(x)
    f2.send(x)
f1.close()
f2.close()

0人赞添加讨论(0) 举报

Consuming two iterators in parallel

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间