I am trying to use IPython.parallel map. The inputs to the function I wish to parallelize are generators. Because of size/memory it is not possible for me to convert the generators to lists. See code below:
from itertools import product
from IPython.parallel import Client
c = Client()
v = c[:]
c.ids
def stringcount(longstring, substrings):
scount = [longstring.count(s) for s in substrings]
return scount
substrings = product('abc', repeat=2)
longstring = product('abc', repeat=3)
# This is what I want to do in parallel
# I should be 'for longs in longstring' I use range() because it can get long.
for num in range(10):
longs = longstring.next()
subs = substrings.next()
print(subs, longs)
count = stringcount(longs, subs)
print(count)
# This does not work, and I understand why.
# I don't know how to fix it while keeping longstring and substrings as
# generators
v.map(stringcount, longstring, substrings)
for r in v:
print(r.get())
I took a slightly different approach to your problem that may be useful to others. Below, I attempted to mimic the behavior of the
multiprocessing.pool.Pool.imap
method by wrappingIPython.parallel.map
. This required me to re-write your functions slightly.The output I'm seeing is on this Notebook: http://nbviewer.ipython.org/gist/driscoll/b8de4bf980de1ad890de
You can't use
View.map
with a generator without walking through the entire generator first. But you can write your own custom function to submit batches of tasks from a generator and wait for them incrementally. I don't have a more interesting example, but I can illustrate with a terrible implementation of a prime search.Start with our token 'data generator':
This just generates a sequence of integers to use when testing if a number is prime.
Now our trivial function that we will use as a task with
IPython.parallel
and a complete implementation of prime check using the generator and our factor function:
A parallel version that only submits a limited number of tasks at a time:
This submits a limited number of tasks at a time, and as soon as we know that N is not prime, we stop consuming the generator.
To use this function:
A more complete illustration in a notebook.