I've been searching for an answer on this now for days to no avail. I'm probably just not understanding the pieces that are floating around out there and the Python documentation on the multiprocessing
module is rather large and not clear to me.
Say you have the following for loop:
import timeit
numbers = []
start = timeit.default_timer()
for num in range(100000000):
numbers.append(num)
end = timeit.default_timer()
print('TIME: {} seconds'.format(end - start))
print('SUM:', sum(numbers))
Output:
TIME: 23.965870224497916 seconds
SUM: 4999999950000000
For this example say you have a 4 core processor. Is there way to create 4 processes in total, where each process is running on a separate CPU core and finish roughly 4 times faster so 24s/4 processes = ~6 seconds?
Somehow divide the for loop up into 4 equal chunks and then have the 4 chunks added into the numbers list to equate the same sum? There was this stackoverflow thread: Parallel Simple For Loop but I don't get it. Thanks all.
So
Pool.map
is like the builtinmap
function. It takes a function and an iterable and produces a list of the result of calling that function on every element of the iterable. Here since we don't actually want to change the elements in the range iterable we just return the argument.The crucial thing is that
Pool.map
divides up the provided iterable (range(1000000000)
here) into chunks and sends them to the number of processes it has (defined here as 4 inPool(4)
) then rejoins the results back into one list.The output I get when running this is
I did a comparison, the time taken to split the tasks sometimes may take longer:
File
multiprocessing_summation.py
:File
multiprocessing_summation_master.py
:Run the second script:
python multiprocessing_summation_master.py 1000 100000 10000000 1000000000
The outputs are:
Yes, that is doable. Your calculation is not dependend on intermediate results, so you can easily divide the task into chunks and distribute it over multiple processes. It's what is called an
The only tricky part here might be, to divide the range into fairly equal parts in the first place. Straight out my personal lib two functions to deal with this:
Then your main script would look like this:
Note that I also switched your for-loop for a simple sum over the range object, since it offers much better performance. If you cant do this in your real app, a list comprehension would still be ~60% faster than filling your list manually like in your example.
Example Output: