I am trying to use the multiprocessing
package to call a function (let's call it myfunc
) in parallel, specifically using pool.map
i.e. pool.map(myfunc, myarglist)
. When I simply loop over myarglist
without using multiprocessing
there are no errors, which should be the case because all operations in myfunc
are called within a try
block. However, when I call the function using pool.map
the script invariably stops running, i.e. it stop printing a "myfunc done!" statement within my function and the processes stop using the CPUs, but it never returns resultlist
. I am running python 2.7 from the terminal in ubuntu 12.04. What could cause this to occur and how should I fix/troubleshoot the problem?
cpu_count = int(multiprocessing.cpu_count())
pool = Pool(processes = cpu_count)
resultlist = pool.map(myfunc, myarglist)
pool.close()
Update
One issue when using multiprocessing can be the size of the object, if you think that may be a problem see this answer. As the answer notes "If this [solution] doesn't work, maybe the stuff you're returning from your functions is not pickleable, and therefore unable to make it through the Queues properly." Multiprocessing passes objects between processes by pickling them. It turns out that one or two of my objects had soup from BeautifulSoup
that would not pickle.