I am using Scrapy spiders inside Celery and I am getting this kind of errors randomly
Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/twisted/python/log.py", line 103, in callWithLogger
return callWithContext({"system": lp}, func, *args, **kw)
File "/usr/lib/python2.7/site-packages/twisted/python/log.py", line 86, in callWithContext
return context.call({ILogContext: newCtx}, func, *args, **kw)
File "/usr/lib/python2.7/site-packages/twisted/python/context.py", line 122, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
File "/usr/lib/python2.7/site-packages/twisted/python/context.py", line 85, in callWithContext
return func(*args,**kw)
--- <exception caught here> ---
File "/usr/lib/python2.7/site-packages/twisted/internet/posixbase.py", line 602, in _doReadOrWrite
why = selectable.doWrite()
exceptions.AttributeError: '_SIGCHLDWaker' object has no attribute 'doWrite'
I am using:
celery==3.1.19
Django==1.9.4
Scrapy==1.3.0
This is how I run Scrapy inside Celery:
from billiard import Process
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
class MyCrawlerScript(Process):
def __init__(self, **kwargs):
Process.__init__(self)
settings = get_project_settings('my_scraper')
self.crawler = CrawlerProcess(settings)
self.spider_name = kwargs.get('spider_name')
self.kwargs = kwargs
def run(self):
self.crawler.crawl(self.spider_name, qwargs=self.kwargs)
self.crawler.start()
def my_crawl_manager(**kwargs):
crawler = MyCrawlerScript(**kwargs)
crawler.start()
crawler.join()
Inside a celery task, I am calling:
my_crawl_manager(spider_name='my_spider', url='www.google.com/any-url-here')
Please any idea why this is happening?
P.S: I have asked another question Why I am Getting KeyError in Scrapy? I don't know if they are somehow similar
I had the same issue. I'm working within a complex application, using
asyncio
,multiprocessing
, Twisted and Scrapy all together.The solution for me was to use
asyncioreactor
, by installing the alternate reactor before any imports inscrapy
: