I am not sure whether this counts more as an OS issue, but I thought I would ask here in case anyone has some insight from the Python end of things.
I've been trying to parallelise a CPU-heavy for
loop using joblib
, but I find that instead of each worker process being assigned to a different core, I end up with all of them being assigned to the same core and no performance gain.
Here's a very trivial example...
from joblib import Parallel,delayed
import numpy as np
def testfunc(data):
# some very boneheaded CPU work
for nn in xrange(1000):
for ii in data[0,:]:
for jj in data[1,:]:
ii*jj
def run(niter=10):
data = (np.random.randn(2,100) for ii in xrange(niter))
pool = Parallel(n_jobs=-1,verbose=1,pre_dispatch='all')
results = pool(delayed(testfunc)(dd) for dd in data)
if __name__ == '__main__':
run()
...and here's what I see in htop
while this script is running:
I'm running Ubuntu 12.10 (3.5.0-26) on a laptop with 4 cores. Clearly joblib.Parallel
is spawning separate processes for the different workers, but is there any way that I can make these processes execute on different cores?
After some more googling I found the answer here.
It turns out that certain Python modules (
numpy
,scipy
,tables
,pandas
,skimage
...) mess with core affinity on import. As far as I can tell, this problem seems to be specifically caused by them linking against multithreaded OpenBLAS libraries.A workaround is to reset the task affinity using
With this line pasted in after the module imports, my example now runs on all cores:
My experience so far has been that this doesn't seem to have any negative effect on
numpy
's performance, although this is probably machine- and task-specific .Update:
There are also two ways to disable the CPU affinity-resetting behaviour of OpenBLAS itself. At run-time you can use the environment variable
OPENBLAS_MAIN_FREE
(orGOTOBLAS_MAIN_FREE
), for exampleOr alternatively, if you're compiling OpenBLAS from source you can permanently disable it at build-time by editing the
Makefile.rule
to contain the lineThis appears to be a common problem with Python on Ubuntu, and is not specific to
joblib
:I would suggest experimenting with CPU affinity (
taskset
).Python 3 now exposes the methods to directly set the affinity