scrapy cannot import module while it's in my p

2019-06-01 14:16发布

问题:

I had a functional scrapy project, and then I decided to clean it. In order to do so, I got my database module out of the scrapy part of my project and I can't include it anymore. Now the project looks like this :

myProject/
    database/
        __init__.py
        model.py
        databaseFactory.py
    myScrapy/
        __init__.py
        settings.py
        myScrapy/
            __init__.py
            pipeline.py
        spiders/
            spiderA.py
            spiderB.py
    api/
        __init__.py
    config/
        __init__.py

(only files related to my question are displayed) I want to use the databaseFactory in scrapy.

I have added to my .bashrc the following lines :

PYTHONPATH=$PYTHONPATH:my/path/to/my/project
export PYTHONPATH

so when launch ipython i can do the following thing :

In [1]: import database.databaseFactory as databaseFactory

In [2]: databaseFactory
Out[2]: <module 'database.databaseFactory' from '/my/path/to/my/project/database/databaseFactory.pyc'>

BUT...

when i try to launch the scrap, with

sudo scrapy crawl spiderName 2> error.log

I can enjoy the following message :

Traceback (most recent call last):
  File "/usr/local/bin/scrapy", line 11, in <module>
    sys.exit(execute())
  File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/commands/crawl.py", line 60, in run
    self.crawler_process.start()
  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 92, in start
    if self.start_crawling():
  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 124, in start_crawling
    return self._start_crawler() is not None
  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 139, in _start_crawler
    crawler.configure()
  File "/usr/local/lib/python2.7/dist-packages/scrapy/crawler.py", line 47, in configure
    self.engine = ExecutionEngine(self, self._spider_closed)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/engine.py", line 65, in __init__
    self.scraper = Scraper(crawler)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/core/scraper.py", line 66, in __init__
    self.itemproc = itemproc_cls.from_crawler(crawler)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 50, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/middleware.py", line 29, in from_settings
    mwcls = load_object(clspath)
  File "/usr/local/lib/python2.7/dist-packages/scrapy/utils/misc.py", line 42, in load_object
    raise ImportError("Error loading object '%s': %s" % (path, e))
ImportError: Error loading object 'myScrapy.pipelines.QueueExportPipe': No module named database.databaseFactory

why does scrapy ignore my PYHTONPATH ? how do I do now ? I really don't want to use sys.path.append() in my code

回答1:

You have to tell python your PYTHONPATH:

export PYTHONPATH=/path/to/myProject/

and then run scrapy:

sudo scrapy crawl spiderName 2> error.log


回答2:

By default, when launching a command with sudo, the normal context is not used, so PYTHONPATH is forgotten. In order to have PYTHONPATH with sudo, follow those steps :

  • add PYTHONPATH to Defaults env_keep += "ENV1 ENV2 ..." in sudoers file
  • remove Defaults !env_reset from sudoers file if present


回答3:

What's wrong of using "sys.path.append()"? I tried many other ways, and determined that "scrapy" doesn't honor the "$PYTHONPATH" for user defined packages. I suspect it loads the directory after the framework has passed the lookup phase. But I tried the "sys.path.append()", it's working.

Jun