I was experimenting with the new shiny concurrent.futures module introduced in Python 3.2, and I've noticed that, almost with identical code, using the Pool from concurrent.futures is way slower than using multiprocessing.Pool.
This is the version using multiprocessing:
def hard_work(n):
# Real hard work here
pass
if __name__ == '__main__':
from multiprocessing import Pool, cpu_count
try:
workers = cpu_count()
except NotImplementedError:
workers = 1
pool = Pool(processes=workers)
result = pool.map(hard_work, range(100, 1000000))
And this is using concurrent.futures:
def hard_work(n):
# Real hard work here
pass
if __name__ == '__main__':
from concurrent.futures import ProcessPoolExecutor, wait
from multiprocessing import cpu_count
try:
workers = cpu_count()
except NotImplementedError:
workers = 1
pool = ProcessPoolExecutor(max_workers=workers)
result = pool.map(hard_work, range(100, 1000000))
Using a naïve factorization function taken from this Eli Bendersky article, these are the results on my computer (i7, 64-bit, Arch Linux):
[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:10] $ time python pool_multiprocessing.py
real 0m10.330s
user 1m13.430s
sys 0m0.260s
[juanlu@nebulae]─[~/Development/Python/test]
└[10:31:29] $ time python pool_futures.py
real 4m3.939s
user 6m33.297s
sys 0m54.853s
I cannot profile these with the Python profiler because I get pickle errors. Any ideas?