As almost everyone is aware when they first look at threading in Python, there is the GIL that makes life miserable for people who actually want to do processing in parallel - or at least give it a chance.
I am currently looking at implementing something like the Reactor pattern. Effectively I want to listen for incoming socket connections on one thread-like, and when someone tries to connect, accept that connection and pass it along to another thread-like for processing.
I'm not (yet) sure what kind of load I might be facing. I know there is currently setup a 2MB cap on incoming messages. Theoretically we could get thousands per second (though I don't know if practically we've seen anything like that). The amount of time spent processing a message isn't terribly important, though obviously quicker would be better.
I was looking into the Reactor pattern, and developed a small example using the multiprocessing
library that (at least in testing) seems to work just fine. However, now/soon we'll have the asyncio library available, which would handle the event loop for me.
Is there anything that could bite me by combining asyncio
and multiprocessing
?
You should be able to safely combine
asyncio
andmultiprocessing
without too much trouble, though you shouldn't be usingmultiprocessing
directly. The cardinal sin ofasyncio
(and any other event-loop based asynchronous framework) is blocking the event loop. If you try to usemultiprocessing
directly, any time you block to wait for a child process, you're going to block the event loop. Obviously, this is bad.The simplest way to avoid this is to use
BaseEventLoop.run_in_executor
to execute a function in aconcurrent.futures.ProcessPoolExecutor
.ProcessPoolExecutor
is a process pool implemented usingmultiprocessing.Process
, butasyncio
has built-in support for executing a function in it without blocking the event loop. Here's a simple example:For the majority of cases, this is function alone is good enough. If you find yourself needing other constructs from
multiprocessing
, likeQueue
,Event
,Manager
, etc., there is a third-party library calledaioprocessing
(full disclosure: I wrote it), that providesasyncio
-compatible versions of all themultiprocessing
data structures. Here's an example demoing that:Yes, there are quite a few bits that may (or may not) bite you.
asyncio
it expects to run on one thread or process. This does not (by itself) work with parallel processing. You somehow have to distribute the work while leaving the IO operations (specifically those on sockets) in a single thread/process.asyncio
without closing it. The next obstacle is that you cannot simply send a file descriptor to a different process unless you use platform-specific (probably Linux) code from a C-extension.multiprocessing
module is known to create a number of threads for communication. Most of the time when you use communication structures (such asQueue
s), a thread is spawned. Unfortunately those threads are not completely invisible. For instance they can fail to tear down cleanly (when you intend to terminate your program), but depending on their number the resource usage may be noticeable on its own.If you really intend to handle individual connections in individual processes, I suggest to examine different approaches. For instance you can put a socket into listen mode and then simultaneously accept connections from multiple worker processes in parallel. Once a worker is finished processing a request, it can go accept the next connection, so you still use less resources than forking a process for each connection. Spamassassin and Apache (mpm prefork) can use this worker model for instance. It might end up easier and more robust depending on your use case. Specifically you can make your workers die after serving a configured number of requests and be respawned by a master process thereby eliminating much of the negative effects of memory leaks.
See PEP 3156, in particular the section on Thread interaction:
http://www.python.org/dev/peps/pep-3156/#thread-interaction
This documents clearly the new asyncio methods you might use, including run_in_executor(). Note that the Executor is defined in concurrent.futures, I suggest you also have a look there.