Running scrapy from script (beginner)

2019-05-22 06:25发布

问题:

I am starting to get into python and yes, I have searched this site and the web for an answer, but somehow I really can't get it to run.

I've created a spiderclass EbaySpider, residing in spider/ebay.py that I can start from the command line without problems (even with output to a JSON file). Now I want to start scrapy from within another .py file, so I can directly access the crawled data and output it to a GUI (will think about how to do that later).

I have taken the code from this question (the askers code, as I don't need to run the spider multiple times) and added

from spiders import ebay
from scrapy.crawler import CrawlerProcess

to the beginning, to have all the necessary resources at hand.

The error I get is

ImportError: cannot import name ebay.

Naturally I have played around with the importstatement, changing it from 'ebay' to 'EbaySpider', changing 'spiders' to 'spiders.ebay' or 'projectname.spiders.ebay' but somehow none of them seem to work.

It would be great if you could tell me how to fix this problem, or another way to run the spider and then have access to the crawled data within my python program. I'm happy with anything that works and is halfway understandable :)

Thanks people!

回答1:

Basically you have three options;

  • install the 'spiders' directory as a module in your PYTHONPATH
  • Put the 'ebay.py' file in the same directory as your script, and just import ebay.
  • modify your python path so python can find your spider.

For the third option, you have to create a file __init__.py in the spiders directory. It can be empty. Then you have to modify your script as follows (assuming that spiders is a subdirectory of the directory your program is running from):

import os
import sys
sys.path.append(os.getcwd()+'/spiders')
print sys.path
from spiders import ebay


回答2:

You can try relative import feature of python to import modules from the directory relative to your python script. The reason you are not able to import the module because spiders module is not your PYTHON_PATH.

from .spiders import ebay

Note: The dot before spiders