I'm having this problem in python:
- I have a queue of URLs that I need to check from time to time
- if the queue is filled up, I need to process each item in the queue
- Each item in the queue must be processed by a single process (multiprocessing)
So far I managed to achieve this "manually" like this:
while 1:
self.updateQueue()
while not self.mainUrlQueue.empty():
domain = self.mainUrlQueue.get()
# if we didn't launched any process yet, we need to do so
if len(self.jobs) < maxprocess:
self.startJob(domain)
#time.sleep(1)
else:
# If we already have process started we need to clear the old process in our pool and start new ones
jobdone = 0
# We circle through each of the process, until we find one free ; only then leave the loop
while jobdone == 0:
for p in self.jobs :
#print "entering loop"
# if the process finished
if not p.is_alive() and jobdone == 0:
#print str(p.pid) + " job dead, starting new one"
self.jobs.remove(p)
self.startJob(domain)
jobdone = 1
However that leads to tons of problems and errors. I wondered if I was not better suited using a Pool of process. What would be the right way to do this?
However, a lot of times my queue is empty, and it can be filled by 300 items in a second, so I'm not too sure how to do things here.
You could use the blocking capabilities of
queue
to spawn multiple process at startup (usingmultiprocessing.Pool
) and letting them sleep until some data are available on the queue to process. If your not familiar with that, you could try to "play" with that simple program:Tested with Python 2.7.3 on Linux
This will spawn 3 processes (in addition of the parent process). Each child executes the
worker_main
function. It is a simple loop getting a new item from the queue on each iteration. Workers will block if nothing is ready to process.At startup all 3 process will sleep until the queue is fed with some data. When a data is available one of the waiting workers get that item and starts to process it. After that, it tries to get an other item from the queue, waiting again if nothing is available...