Difference between BaseSpider and CrawlSpider

2020-07-06 05:22发布

I have been trying to understand the concept of using BaseSpider and CrawlSpider in web scrapping. I have read the docs. But there is no mention on BaseSpider. It would be really helpful to me if someone explain the differences between BaseSpider and CrawlSpider.

标签： python python-2.7 web-scraping scrapy scrapy-spider

1条回答

趁早两清

2楼-- · 2020-07-06 06:09

BaseSpider is something existed before and now is deprecated (since 0.22) - use scrapy.Spider instead:

import scrapy

class MySpider(scrapy.Spider):
    # ...

scrapy.Spider is the simplest spider that would, basically, visit the URLs defined in start_urls or returned by start_requests().

Use CrawlSpider when you need a "crawling" behavior - extracting the links and following them:

This is the most commonly used spider for crawling regular websites, as it provides a convenient mechanism for following links by defining a set of rules. It may not be the best suited for your particular web sites or project, but it’s generic enough for several cases, so you can start from it and override it as needed for more custom functionality, or just implement your own spider.

0人赞添加讨论(0) 举报

Difference between BaseSpider and CrawlSpider

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间