I'm using scrapy to run a spider and get the following errors:
DEBUG: Retrying http://xixichengyuanlc.fang.com/esf/> (failed 2 times): An error occurred while connecting: [Failure instance: Traceback (failure with no frames): : Connection to the other side was lost in a non-clean fashion: Connection lost.
I have ever successfully run this spider for several times but I want to use some user agents to run faster and get the errors above. At first I thought there might be something wrong with my user agents, so I checked but still can't figure out.And then I want to try the former spider again but still get the same errors.
below is my settings.py
# Scrapy settings for soufang project
SPIDER_MODULES = ['soufang.spiders']
NEWSPIDER_MODULE = 'soufang.spiders'
DEFAULT_ITEM_CLASS = 'soufang.items.Community_info'
ITEM_PIPELINES = ['soufang.pipelines.MySQLStorePipeline']
#DOWNLOADER_MIDDLEWARES={
#'scrapy.contrib.downloadermiddleware.useragent.UserAgentMiddleware': None,
#'soufang.misc.middlewares.CustomUserAgentMiddleware':400}
The
ITEM_PIPELINES
setting is not alist
, but adict
:Other than that, I can't say what's wrong exactly. I don't see you have set
USER_AGENT
in your settings? Also, paste the full log.