Python how to parallelize loops

2020-05-04 07:21发布

问题:

I am very new to multi-threading and multi-processing and trying to make for loop parallel. I searched similar questions, and created code based on multiprocessing module.

import timeit, multiprocessing

start_time = timeit.default_timer()

d1 = dict( (i,tuple([i*0.1,i*0.2,i*0.3])) for i in range(500000) )
d2={}

def fun1(gn):
    for i in gn:
        x,y,z = d1[i]
        d2.update({i:((x+y+z)/3)})


if __name__ == '__main__':
    gen1 = [x for x in d1.keys()]
    fun1(gen1)
    #p= multiprocessing.Pool(3)
    #p.map(fun1,gen1)

    print('Script finished')
    stop_time = timeit.default_timer()
    print(stop_time - start_time)

# Output:

Script finished
0.8113944193950299

If I change code like:

#fun1(gen1)
p= multiprocessing.Pool(5)
p.map(fun1,gen1)

I get errors:

for i in gn:
TypeError: 'int' object is not iterable
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    raise self._value

Any ideas how to make this parallel? MATLAB has a parfor option to make parallel loops. I am trying to make loop parallel using this approach, but it is not working. Any ideas how can I make loops parallel? Also, what if the function returns a value - can I write something like a,b,c=p.map(fun1,gen1) if fun1() returns 3 values?

(Running on Windows python 3.6)

回答1:

As @Alex Hall mentioned, remove iteration from fun1. Also, wait till all pool's workers are finished.

PEP8 note: import timeit, multiprocessing is bad practice, split it to two lines.

import multiprocessing
import timeit


start_time = timeit.default_timer()

d1 = dict( (i,tuple([i*0.1,i*0.2,i*0.3])) for i in range(500000) )
d2 = {}

def fun1(gn):
    x,y,z = d1[gn]
    d2.update({gn: ((x+y+z)/3)})


if __name__ == '__main__':
    gen1 = [x for x in d1.keys()]

    # serial processing
    for gn in gen1:
        fun1(gn)

    # paralel processing
    p = multiprocessing.Pool(3)
    p.map(fun1, gen1)
    p.close()
    p.join()

    print('Script finished')
    stop_time = timeit.default_timer()
    print(stop_time - start_time)


回答2:

p.map does the looping for you, so remove the for i in gn:.

That is, p.map applies fun1 to each element of gen1, so gn is one of those elements.