scrapy crawl [spider-name] fault

2019-02-20 02:21发布

问题:

Hi guys i am building a web scraping project using scrapy framework and python. In spider folder of my project i have two spiders named spider1 and spider2

spider1.py

class spider(BaseSpider):
    name= "spider1"
    ........
    ........

spider2.py

class spider(BaseSpider):
    name="spider2"
    ............
    ...........

settings.py

SPIDER_MODULES = ['project_name.spiders']
NEWSPIDER_MODULE = ['project_name.spiders']
ITEM_PIPELINES = ['project_name.pipelines.spider']

Now when i write the command scrapy crawl spider1 in my root project folder it calls spider2.py instead of spider1.py. when i will delete spider2.py from my project then it calls spider1.py

Earlier 1 day back its working fine for 1 month but suddenly what happens i can't figure it out please help me guys

回答1:

I tackled the same problem, however removing all *.pyc files from everywhere in my project did the job.

Especially I think settings.pyc is important to remove.

Hope that helps.



回答2:

Building on Nomad's answer. You can avoid the creation of all but one pyc file during development by adding:

import sys
sys.dont_write_bytecode = True

to the project's "__init__.py" file.

This will prevent .pyc files from being created. Especially useful if you are working on a project and you rename the file name of a spider. Prevents the cached pyc of the old spiders remaining, and a few other gotchas.