python-rq worker closes automatically

2019-09-11 13:27发布

问题:

I am implementing python-rq to pass domains in a queue and scrape it using Beautiful Soup. So i am running multiple workers to get the job done. I started 22 workers as of now, and all the 22 workers is registered in the rq dashboard. But after some time the worker stops by itself and is not getting displayed in dashboard. But in webmin, it displays all workers as running. The speed of crawling has also decreased i.e. the workers are not running. I tried running the worker using supervisor and nohup. In both the cases the workers stops by itself.

What is the reason for this? Why does workers stops by itself? And how many workers can we start in a single server?

Along with that, whenever a worker is unregistered from the rq dashboard, the failed count increases. I don't understand why?

Please help me with this. Thank You

回答1:

Okay I figured out the problem. It was because of worker timeout.

try:
  --my code goes here--
except Exception, ex:
  self.error += 1
  with open("error.txt", "a") as myfile:
     myfile.write('\n%s' % sys.exc_info()[0] + "{}".format(self.url))
  pass

So according to my code, the next domain is dequeued if 200 url(s) is fetched from each domain. But for some domains there were insufficient number of urls for the condition to terminate (like only 1 or 2 urls).

Since the code catches all the exception and appends to error.txt file. Even the rq timeout exception rq.timeouts.JobTimeoutException was caught and was appended to the file. Thus making the worker to wait for x amount of time, which leads to termination of the worker.