Consider the following function:
def f(x, dummy=list(range(10000000))):
return x
If I use multiprocessing.Pool.imap
, I get the following timings:
import time
import os
from multiprocessing import Pool
def f(x, dummy=list(range(10000000))):
return x
start = time.time()
pool = Pool(2)
for x in pool.imap(f, range(10)):
print("parent process, x=%s, elapsed=%s" % (x, int(time.time() - start)))
parent process, x=0, elapsed=0
parent process, x=1, elapsed=0
parent process, x=2, elapsed=0
parent process, x=3, elapsed=0
parent process, x=4, elapsed=0
parent process, x=5, elapsed=0
parent process, x=6, elapsed=0
parent process, x=7, elapsed=0
parent process, x=8, elapsed=0
parent process, x=9, elapsed=0
Now if I use functools.partial
instead of using a default value:
import time
import os
from multiprocessing import Pool
from functools import partial
def f(x, dummy):
return x
start = time.time()
g = partial(f, dummy=list(range(10000000)))
pool = Pool(2)
for x in pool.imap(g, range(10)):
print("parent process, x=%s, elapsed=%s" % (x, int(time.time() - start)))
parent process, x=0, elapsed=1
parent process, x=1, elapsed=2
parent process, x=2, elapsed=5
parent process, x=3, elapsed=7
parent process, x=4, elapsed=8
parent process, x=5, elapsed=9
parent process, x=6, elapsed=10
parent process, x=7, elapsed=10
parent process, x=8, elapsed=11
parent process, x=9, elapsed=11
Why is the version using functools.partial
so much slower?
Using
multiprocessing
requires sending the worker processes information about the function to run, not just the arguments to pass. That information is transferred by pickling that information in the main process, sending it to the worker process, and unpickling it there.This leads to the primary issue:
Pickling a function with default arguments is cheap; it only pickles the name of the function (plus the info to let Python know it's a function); the worker processes just look up the local copy of the name. They already have a named function
f
to find, so it costs almost nothing to pass it.But pickling a
partial
function involves pickling the underlying function (cheap) and all the default arguments (expensive when the default argument is a 10M longlist
). So every time a task is dispatched in thepartial
case, it's pickling the bound argument, sending it to the worker process, the worker process unpickles, then finally does the "real" work. On my machine, that pickle is roughly 50 MB in size, which is a huge amount of overhead; in quick timing tests on my machine, pickling and unpickling a 10 million longlist
of0
takes about 620 ms (and that's ignoring the overhead of actually transferring the 50 MB of data).partial
s have to pickle this way, because they don't know their own names; when pickling a function likef
,f
(beingdef
-ed) knows its qualified name (in an interactive interpreter or from the main module of a program, it's__main__.f
), so the remote side can just recreate it locally by doing the equivalent offrom __main__ import f
. But thepartial
doesn't know its name; sure, you assigned it tog
, but neitherpickle
nor thepartial
itself know it available with the qualified name__main__.g
; it could be namedfoo.fred
or a million other things. So it has topickle
the info necessary to recreate it entirely from scratch. It's alsopickle
-ing for each call (not just once per worker) because it doesn't know that the callable isn't changing in the parent between work items, and it's always trying to ensure it sends up to date state.You have other issues (timing creation of the
list
only in thepartial
case and the minor overhead of calling apartial
wrapped function vs. calling the function directly), but those are chump change relative to the per-call overhead pickling and unpickling thepartial
is adding (the initial creation of thelist
is adding one-time overhead of a little under half what each pickle/unpickle cycle costs; the overhead to call through thepartial
is less than a microsecond).