How to use PyCharm to debug Scrapy projects

2019-01-15 23:58发布

I am working on Scrapy 0.20 with Python 2.7. I found PyCharm has a good Python debugger. I want to test my Scrapy spiders using it. Anyone knows how to do that please?

What I have tried

Actually I tried to run the spider as a scrip. As a result, I built that scrip. Then, I tried to add my Scrapy project to PyCharm as a model like this:

File->Setting->Project structure->Add content root.

But I don't know what else I have to do

9条回答
Summer. ? 凉城
2楼-- · 2019-01-16 00:40

I use this simple script:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

process = CrawlerProcess(get_project_settings())

process.crawl('your_spider_name')
process.start()
查看更多
Emotional °昔
3楼-- · 2019-01-16 00:41

I am also using PyCharm, but I am not using its built-in debugging features.

For debugging I am using ipdb. I set up a keyboard shortcut to insert import ipdb; ipdb.set_trace() on any line I want the break point to happen.

Then I can type n to execute the next statement, s to step in a function, type any object name to see its value, alter execution environment, type c to continue execution...

This is very flexible, works in environments other than PyCharm, where you don't control the execution environment.

Just type in your virtual environment pip install ipdb and place import ipdb; ipdb.set_trace() on a line where you want the execution to pause.

查看更多
放荡不羁爱自由
4楼-- · 2019-01-16 00:48

I am running scrapy in a virtualenv with Python 3.5.0 and setting the "script" parameter to /path_to_project_env/env/bin/scrapy solved the issue for me.

查看更多
家丑人穷心不美
5楼-- · 2019-01-16 00:51

You just need to do this.

Create a Python file on crawler folder on your project. I used main.py.

  • Project
    • Crawler
      • Crawler
        • Spiders
        • ...
      • main.py
      • scrapy.cfg

Inside your main.py put this code below.

from scrapy import cmdline    
cmdline.execute("scrapy crawl spider".split())

And you need to create a "Run Configuration" to run your main.py.

Doing this, if you put a breakpoint at your code it will stop there.

查看更多
你好瞎i
6楼-- · 2019-01-16 00:55

intellij idea also work.

create main.py:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#coding=utf-8
import sys
from scrapy import cmdline
def main(name):
    if name:
        cmdline.execute(name.split())



if __name__ == '__main__':
    print('[*] beginning main thread')
    name = "scrapy crawl stack"
    #name = "scrapy crawl spa"
    main(name)
    print('[*] main thread exited')
    print('main stop====================================================')

show below:

enter image description here

enter image description here

enter image description here

查看更多
爷的心禁止访问
7楼-- · 2019-01-16 00:59

To add a bit to the accepted answer, after almost an hour I found I had to select the correct Run Configuration from the dropdown list (near the center of the icon toolbar), then click the Debug button in order to get it to work. Hope this helps!

查看更多
登录 后发表回答