Consuming two iterators in parallel

2020-04-11 14:26发布

问题:

Suppose I have two iterators, and I want to compute

fancyoperation1(iter1), fancyoperation2(iter2)

Normally, I would simply use fancyoperation1(iter1), fancyoperation2(iter2). However, if these iterators are linked to a single source, perhaps teed from a single iterator, I can't do this without keeping a lot of temporary data in memory. In that case, I know of several options:

  • I could rewrite fancyoperation1 and fancyoperation2 into a single function that does both at the same time, but that may be a lot of code duplication, and I may not understand or have the source code for either function. Also, this would need to be done anew for every pair of operations.
  • I could use threading. The synchronization can probably be written once in a helper function, and the overhead probably wouldn't be too bad as long as I don't need to switch threads too often.
  • I could keep a lot of temporary data in memory.

I don't really like the drawbacks of those options, though. Is there a way to do what I want in one thread, without rewriting things or using large amounts of memory? I tried to do it with coroutines, but Python's yield doesn't seem to be powerful enough.

(I do not currently have this problem, but I'm wondering what to do if it ever comes up.)

回答1:

You absolutely can use coroutines for this, it's just slightly less convenient (but on the bright side, you can keep them separated and can leave most code unaltered). Change the fancy operations to be parameterless and repeatedly use yield (as expression) to fetch data instead of accepting a parameter and iterating over it. In other words, change this:

def fancyoperation1(it):
    for x in it:
        ...
    cleanup()

# into something like this

def fancyoperation1():
    while True:
        try:
            x = yield
        except GeneratorExit:
            break
        ...
    cleanup()

Of course, it's easier if there is no post-iteration clean up to be done. You can use these like this (assuming iter1, iter2 = tee(underlying_iter)):

f1, f2 = fancyoperation1(), fancyoperation2()
f1.send(None) # start coroutines
f2.send(None)

for x in underlying_iterator:
    f1.send(x)
    f2.send(x)
f1.close()
f2.close()