How to use PyCharm to debug Scrapy projects

2019-01-15 23:58发布

I am working on Scrapy 0.20 with Python 2.7. I found PyCharm has a good Python debugger. I want to test my Scrapy spiders using it. Anyone knows how to do that please?

What I have tried

Actually I tried to run the spider as a scrip. As a result, I built that scrip. Then, I tried to add my Scrapy project to PyCharm as a model like this:

File->Setting->Project structure->Add content root.

But I don't know what else I have to do

9条回答
ゆ 、 Hurt°
2楼-- · 2019-01-16 01:03

The scrapy command is a python script which means you can start it from inside PyCharm.

When you examine the scrapy binary (which scrapy) you will notice that this is actually a python script:

#!/usr/bin/python

from scrapy.cmdline import execute
execute()

This means that a command like scrapy crawl IcecatCrawler can also be executed like this: python /Library/Python/2.7/site-packages/scrapy/cmdline.py crawl IcecatCrawler

Try to find the scrapy.cmdline package. In my case the location was here: /Library/Python/2.7/site-packages/scrapy/cmdline.py

Create a run/debug configuration inside PyCharm with that script as script. Fill the script parameters with the scrapy command and spider. In this case crawl IcecatCrawler.

Like this: PyCharm Run/Debug Configuration

Put your breakpoints anywhere in your crawling code and it should work™.

查看更多
三岁会撩人
3楼-- · 2019-01-16 01:03

According to the documentation https://doc.scrapy.org/en/latest/topics/practices.html

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
查看更多
再贱就再见
4楼-- · 2019-01-16 01:04

As of 2018.1 this became a lot easier. You can now select Module name in your project's Run/Debug Configuration. Set this to scrapy.cmdline and the Working directory to the root dir of the scrapy project (the one with settings.py in it).

Like so:

PyCharm Scrapy debug configuration

Now you can add breakpoints to debug your code.

查看更多
登录 后发表回答