I had a functional scrapy project, and then I decided to clean it. In order to do so, I got my database module out of the scrapy part of my project and I can't include it anymore. Now the project looks like this :
myProject/
database/
__init__.py
model.py
databaseFactory.py
myScrapy/
__init__.py
settings.py
myScrapy/
__init__.py
pipeline.py
spiders/
spiderA.py
spiderB.py
api/
__init__.py
config/
__init__.py
(only files related to my question are displayed) I want to use the databaseFactory in scrapy.
I have added to my .bashrc the following lines :
PYTHONPATH=$PYTHONPATH:my/path/to/my/project
export PYTHONPATH
so when launch ipython i can do the following thing :
In [1]: import database.databaseFactory as databaseFactory
In [2]: databaseFactory
Out[2]: <module 'database.databaseFactory' from '/my/path/to/my/project/database/databaseFactory.pyc'>
BUT...
when i try to launch the scrap, with
sudo scrapy crawl spiderName 2> error.log
I can enjoy the following message :
Traceback (most recent call last):
File "/usr/local/bin/scrapy", line 11, in <module>
sys.exit(execute())
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 143, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 89, in _run_print_help
func(*a, **kw)
File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 150, in _run_command
cmd.run(args, opts)
File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 60, in run
self.crawler_process.start()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 92, in start
if self.start_crawling():
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 124, in start_crawling
return self._start_crawler() is not None
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 139, in _start_crawler
crawler.configure()
File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 47, in configure
self.engine = ExecutionEngine(self, self._spider_closed)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 65, in __init__
self.scraper = Scraper(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in __init__
self.itemproc = itemproc_cls.from_crawler(crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 50, in from_crawler
return cls.from_settings(crawler.settings, crawler)
File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 29, in from_settings
mwcls = load_object(clspath)
File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 42, in load_object
raise ImportError("Error loading object '%s': %s" % (path, e))
ImportError: Error loading object 'myScrapy.pipelines.QueueExportPipe': No module named database.databaseFactory
why does scrapy ignore my PYHTONPATH ? how do I do now ? I really don't want to use sys.path.append() in my code
You have to tell python your PYTHONPATH:
and then run scrapy:
What's wrong of using "sys.path.append()"? I tried many other ways, and determined that "scrapy" doesn't honor the "$PYTHONPATH" for user defined packages. I suspect it loads the directory after the framework has passed the lookup phase. But I tried the "sys.path.append()", it's working.
Jun
By default, when launching a command with sudo, the normal context is not used, so PYTHONPATH is forgotten. In order to have PYTHONPATH with sudo, follow those steps :