Pathos multiprocessing can't call any package

2019-02-26 18:57发布

I want to do multiprocessing in the class. It seems like only pathos.multiprocessing is able to help me. However, when I implement it, it can't load the packages I use in the main function.

from pathos.multiprocessing import ProcessingPool;
import time
import sys;
import datetime


class tester:
    def __init__(self):
        self.pool=ProcessingPool(2);

    def func(self,msg):
        print (str(datetime.datetime.now()));
        for i in xrange(1):
            print msg
            sys.stdout.flush();
        time.sleep(2)    

#----------------------------------------------------------------------
    def worker(self):
        """"""
        pool=self.pool
        for i in xrange(10):
               msg = "hello %d" %(i)
               pool.map(self.func,[i])
        pool.close()
        pool.join()
        time.sleep(40)



if __name__ == "__main__":
    print datetime.datetime.now();
    t=tester()
    t.worker()
    time.sleep(60);
    print "Sub-process(es) done."

the wrong is that global name 'datetime' is not defined. But it works in the main function! My sys is Win7.

1条回答
狗以群分
2楼-- · 2019-02-26 19:20

I'm the author of pathos. If you execute your code on non-windows systems, it works fine -- even from the interpreter. (It also works from a file, as is too).

>>> from pathos.multiprocessing import ProcessingPool;
>>> import time
>>> import sys;
>>> import datetime
>>> class tester:
...     def __init__(self):
...         self.pool=ProcessingPool(2);
...     def func(self,msg):
...         print (str(datetime.datetime.now()));
...         for i in xrange(1):
...             print msg
...             sys.stdout.flush();
...         time.sleep(2)    
...     def worker(self):
...         """"""
...         pool=self.pool
...         for i in xrange(10):
...                msg = "hello %d" %(i)
...                pool.map(self.func,[i])
...         pool.close()
...         pool.join()
...         time.sleep(40)
... 
>>> datetime.datetime.now()
datetime.datetime(2015, 10, 21, 19, 24, 16, 131225)
>>> t = tester()
>>> t.worker()
2015-10-21 19:24:25.927781
0
2015-10-21 19:24:27.933611
1
2015-10-21 19:24:29.938630
2
2015-10-21 19:24:31.942376
3
2015-10-21 19:24:33.946052
4
2015-10-21 19:24:35.949965
5
2015-10-21 19:24:37.953877
6
2015-10-21 19:24:39.957770
7
2015-10-21 19:24:41.961704
8
2015-10-21 19:24:43.965193
9
>>>

The issue is that multiprocessing fundamentally is different on windows, in that windows doesn't have a true fork… and thus isn't as flexible as on systems with a fork. multiprocessing has a forking pickler, that under the covers spawns a subprocess… while non-windows systems can utilize shared memory across the processes.

dill has a check and a copy method that does a sequential loads(dumps(object)) on some object, where copy uses shared memory, while check uses subprocess (as is done on windows in multiprocessing). Here's the check method on a mac, so apparently that's not the issue.

>>> import dill
>>> dill.check(t.func)
<bound method tester.func of <__main__.tester instance at 0x1051c7998>>

The other thing you need to do on windows, is to use freeze_support at the beginning of __main__ (i.e. the first line of __main__). It's unnecessary on non-windows systems, but pretty much necessary on windows. Here's the doc.

>>> import pathos
>>> print pathos.multiprocessing.freeze_support.__doc__

    Check whether this is a fake forked process in a frozen executable.
    If so then run code specified by commandline and exit.

>>>
查看更多
登录 后发表回答