Multiprocessing works in Ubuntu, doesn't in Wi

2019-02-18 10:06发布

问题:

I am trying to use this example as a template for a queuing system on my cherrypy app.

I was able to convert it from python 2 to python 3 (change from Queue import Empty into from queue import Empty) and to execute it in Ubuntu. But when I execute it in Windows I get the following error:

F:\workspace\test>python test.py
Traceback (most recent call last):
  File "test.py", line 112, in <module>
    broker.start()
  File "C:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\Anaconda3\lib\multiprocessing\context.py", line 212, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Anaconda3\lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)
  File "C:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\Anaconda3\lib\multiprocessing\reduction.py", line 59, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot serialize '_io.TextIOWrapper' object

F:\workspace\test>Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Anaconda3\lib\multiprocessing\spawn.py", line 100, in spawn_main
    new_handle = steal_handle(parent_pid, pipe_handle)
  File "C:\Anaconda3\lib\multiprocessing\reduction.py", line 81, in steal_handle
    _winapi.PROCESS_DUP_HANDLE, False, source_pid)
OSError: [WinError 87] The parameter is incorrect

Here is the full code:

# from http://www.defuze.org/archives/198-managing-your-process-with-the-cherrypy-bus.html

import sys
import logging
from logging import handlers

from cherrypy.process import wspbus

class MyBus(wspbus.Bus):
    def __init__(self, name=""):
        wspbus.Bus.__init__(self)
        self.open_logger(name)
        self.subscribe("log", self._log)

    def exit(self):
        wspbus.Bus.exit(self)
        self.close_logger()

    def open_logger(self, name=""):
        logger = logging.getLogger(name)
        logger.setLevel(logging.INFO)
        h = logging.StreamHandler(sys.stdout)
        h.setLevel(logging.INFO)
        h.setFormatter(logging.Formatter("[%(asctime)s] %(name)s - %(levelname)s - %(message)s"))
        logger.addHandler(h)

        self.logger = logger

    def close_logger(self):
        for handler in self.logger.handlers:
            handler.flush()
            handler.close()

    def _log(self, msg="", level=logging.INFO):
        self.logger.log(level, msg)



import random
import string
from multiprocessing import Process

class Bank(object):
    def __init__(self, queue):
        self.bus = MyBus(Bank.__name__)
        self.queue = queue
        self.bus.subscribe("main", self.randomly_place_order)
        self.bus.subscribe("exit", self.terminate)

    def randomly_place_order(self):
        order = random.sample(['BUY', 'SELL'], 1)[0]
        code = random.sample(string.ascii_uppercase, 4)
        amount = random.randint(0, 100)

        message = "%s %s %d" % (order, ''.join(code), amount)

        self.bus.log("Placing order: %s" % message)

        self.queue.put(message)

    def run(self):
        self.bus.start()
        self.bus.block(interval=0.01)

    def terminate(self):
        self.bus.unsubscribe("main", self.randomly_place_order)
        self.bus.unsubscribe("exit", self.terminate)


from queue import Empty

class Broker(Process):
    def __init__(self, queue):
        Process.__init__(self)
        self.queue = queue
        self.bus = MyBus(Broker.__name__)
        self.bus.subscribe("main", self.check)

    def check(self):
        try:
            message = self.queue.get_nowait()
        except Empty:
            return

        if message == "stop":
            self.bus.unsubscribe("main", self.check)
            self.bus.exit()
        elif message.startswith("BUY"):
            self.buy(*message.split(' ', 2)[1:])
        elif message.startswith("SELL"):
            self.sell(*message.split(' ', 2)[1:])

    def run(self):
        self.bus.start()
        self.bus.block(interval=0.01)

    def stop(self):
        self.queue.put("stop")

    def buy(self, code, amount):
        self.bus.log("BUY order placed for %s %s" % (amount, code))

    def sell(self, code, amount):
        self.bus.log("SELL order placed for %s %s" % (amount, code))




if __name__ == '__main__':
    from multiprocessing import Queue
    queue = Queue()

    broker = Broker(queue)
    broker.start()

    bank = Bank(queue)
    bank.run()

回答1:

The problem is that parts of the MyBus object are not picklable, and you're saving an instance of MyBus to your Broker instance. Because Windows lacks fork() support, when you call broker.start(), the entire state of broker must be pickled and recreated in the child process that multiprocessing spawns to execute broker.run. It works on Linux because Linux supports fork; it doesn't need to pickle anything in this case - the child process contains the complete state of the parent as soon as it is forked.

There are two ways to sole this problem. The first, and more difficult, way, is to make your broker instance picklable. To do that, you need to make MyBus picklable. The error you're getting right now refers to the logger attribute on MyBus, which is not picklable. That one is easy to fix; just add __getstate__/__setstate__ methods to MyBus, which are used to control how the object is pickled/unpickled. If we remove the logger when we pickle, and recreate it when we unpickle, we'll work around the issue:

class MyBus(wspbus.Bus):
    ... 
    def __getstate__(self):
        self_dict = self.__dict__
        del self_dict['logger']
        return self_dict

    def __setstate__(self, d):
        self.__dict__.update(d)
        self.open_logger()

This works, but then we hit another pickling error:

Traceback (most recent call last):
  File "async2.py", line 121, in <module>
    broker.start()
  File "C:\python34\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "C:\python34\lib\multiprocessing\context.py", line 212, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\python34\lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)
  File "C:\python34\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
    reduction.dump(process_obj, to_child)
  File "C:\python34\lib\multiprocessing\reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'cherrypy.process.wspbus._StateEnum.State'>: attribute lookup State on cherrypy.process.wspbus failed

Turns outcherrypy.process.wspbus._StateEnum.State, which is an attribute on the wspbus.Bus class inherited by MyBus, is a nested class, and nested classes can't be pickled:

class _StateEnum(object):
    class State(object):
        name = None
        def __repr__(self):
            return "states.%s" % self.name

The State object (surprise) is used to track the Bus instance's state. Since we're doing the pickling before we start up the bus, we could just remove the state attribute from the object when we pickle, and set it to States.STOPPED when we unpickle.

class MyBus(wspbus.Bus):
    def __init__(self, name=""):
        wspbus.Bus.__init__(self)
        self.open_logger(name)
        self.subscribe("log", self._log)

    def __getstate__(self):
        self_dict = self.__dict__
        del self_dict['logger']
        del self_dict['state']
        return self_dict

    def __setstate__(self, d):
        self.__dict__.update(d)
        self.open_logger()
        self.state = wspbus.states.STOPPED  # Initialize to STOPPED

With these changes, the code works as expected! The only limitation is that it's only safe to pickle MyBus if the bus hasn't started yet, which is fine for your usecase.

Again, this is the hard way. The easy way is to just remove the need to pickle the MyBus instance altogether. You can just create the MyBus instance in the child process, rather than the parent:

class Broker(Process):
    def __init__(self, queue):
        Process.__init__(self)
        self.queue = queue

...
    def run(self):
        self.bus = MyBus(Broker.__name__)  # Create the instance here, in the child
        self.bus.subscribe("main", self.check)
        self.bus.start()
        self.bus.block(interval=0.01)

As long as you don't need to access broker.bus in the parent, this is the simpler option.