python joblib Parallel on Windows not working even

2020-06-04 03:56发布

问题:

I'm running parallel processing in Python on Windows. Here's my code:

from joblib import Parallel, delayed

def f(x): 
    return sqrt(x)

if __name__ == '__main__':
    a = Parallel(n_jobs=2)(delayed(f)(i) for i in range(10))

Here's the error message:

Process PoolWorker-2:  
Process PoolWorker-1:  
Traceback (most recent call last):    
File "C:\Users\yoyo__000.BIGBLACK\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.4.3105.win-x86_64\lib\multiprocessing\process.py", line 258, in _bootstrap
self.run()   
File "C:\Users\yoyo__000.BIGBLACK\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.4.3105.win-x86_64\lib\multiprocessing\process.py", line 114, in run
self._target(*self._args, **self._kwargs)   
File "C:\Users\yoyo__000.BIGBLACK\AppData\Local\Enthought\Canopy\App\appdata\canopy-1.5.4.3105.win-x86_64\lib\multiprocessing\pool.py", line 102, in worker
task = get()   
File "C:\Users\yoyo__000.BIGBLACK\AppData\Local\Enthought\Canopy\User\lib\site-packages\joblib\pool.py", line 363, in get
return recv()  
AttributeError: 'module' object has no attribute 'f'

回答1:

According to this site the problem is Windows specific:

Yes: under linux we are forking, thus their is no need to pickle the function, and it works fine. Under windows, the function needs to be pickleable, ie it needs to be imported from another file. This is actually good practice: making modules pushes for reuse.

I've tried your code and it works flawlessly under Linux. Under Windows it runs OK if it is run from a script, like python script_with_your_code.py. But it fails when ran in an interactive python session. It worked for me when I saved the f function in separate module and imported it into my interactive session.

NOT WORKING:
Interactive session:

>>> from math import sqrt
>>> from joblib import Parallel, delayed

>>> def f(x):
...     return sqrt(x)

>>> if __name__ == '__main__':
...     a = Parallel(n_jobs=2)(delayed(f)(i) for i in range(10))
...
Process PoolWorker-1:
Traceback (most recent call last):
  File "C:\Python27\lib\multiprocessing\process.py", line 258, in _bootstrap
    self.run()
  File "C:\Python27\lib\multiprocessing\process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Python27\lib\multiprocessing\pool.py", line 102, in worker
    task = get()
  File "C:\Python27\lib\site-packages\joblib\pool.py", line 359, in get
    return recv()
AttributeError: 'module' object has no attribute 'f'


WORKING:
fun.py

from math import sqrt

def f(x):
    return sqrt(x)

Interactive session:

>>> from joblib import Parallel, delayed
>>> from fun import f

>>> if __name__ == '__main__':
...     a = Parallel(n_jobs=2)(delayed(f)(i) for i in range(10))
...
>>> a
[0.0, 1.0, 1.4142135623730951, 1.7320508075688772, 2.0, 2.23606797749979, 2.449489742783178, 2.6457513110645907, 2.8284271247461903, 3.0]