Multiprocessing Queue maxsize limit is 32767

2019-01-14 08:18发布

问题:

I'm trying to write a Python 2.6 (OSX) program using multiprocessing, and I want to populate a Queue with more than the default of 32767 items.

from multiprocessing import Queue
Queue(2**15) # raises OSError

Queue(32767) works fine, but any higher number (e.g. Queue(32768)) fails with OSError: [Errno 22] Invalid argument

Is there a workaround for this issue?

回答1:

One approach would be to wrap your multiprocessing.Queue with a custom class (just on the producer side, or transparently from the consumer perspective). Using that you would queue up items to be dispatched to the Queue object that you're wrapping, and only feed things from the local queue (Python list() object) into the multiprocess.Queue as space becomes available, with exception handling to throttle when the Queue is full.

That's probably the easiest approach since it should have the minimum impact on the rest of your code. The custom class should behave just like a Queue while hiding the underlying multiprocessing.Queue behind your abstraction.

(One approach might be to have your producer use threads, one thread to manage the dispatch from a threading Queue to your multiprocessing.Queue and any other threads actually just feeding the threading Queue).



回答2:

I've already answered the original question but I do feel like adding that Redis lists are quite reliable and the Python module's support for them are extremely easy to use for implementing a Queue like object. These have the advantage of allowing one to scale out over multiple nodes (across a network) as well as just over multiple processes.

Basically to use those you'd just pick a key (string) for your queue name have your producers push into it and have your workers (task consumers) loop on blocking pops from that key.

The Redis BLPOP, and BRPOP commands all take a list of keys (lists/queues) and an optional timeout value. They return a tuple (key,value) or None (on timeout). So you can easily write up an event driven system that's very similar to the familiar structure of select() (but at a much higher level). The only thing you have to watch for are missing keys and invalid key types (just wrap your queue operations with exception handlers, of course). (If some other application stops on your shared Redis server removing keys or replacing keys that you were using as queues with string/integer or other types of values ... well, you have a different problem at that point). :)

Another advantage of this model is that Redis does persist its data to the disk. So your work queue could survive system restarts if you chose to allow it.

(Of course you could implement a simple Queue as a table in SQLlite or any other SQL system if you really wanted to do so; just using some sort of auto-incrementing index for the sequencing and a column to mark each item has having been "done" (consumed); but that does involve somewhat more complexity than using what Redis gives you "out of the box").



回答3:

Working for me on MacOSX

>>> import Queue
>>> Queue.Queue(30000000)
<Queue.Queue instance at 0x1006035f0>