Scrapy Follow & Scrape next Pages

2019-07-17 04:33发布

I am having a problem where none of my scrapy spiders will crawl a website, just scrape one page and seize. I was under the impression that the rules member variable was responsible for this, but I can't get it to follow any links. I have been following the documentation from here: http://doc.scrapy.org/en/latest/topics/spiders.html#crawlspider

What could I be missing that is making none of my bots crawl?

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from scrapy.selector import Selector

from Example.items import ExItem

class ExampleSpider(CrawlSpider):
    name = "example"
    allowed_domains = ["example.ac.uk"]
    start_urls = (
        'http://www.example.ac.uk',
    )

    rules = ( Rule (LinkExtractor(allow=("", ),),
                    callback="parse_items",  follow= True),
    )

标签： python python-2.7 web-scraping scrapy

1条回答

老娘就宠你

2楼-- · 2019-07-17 04:51

Replace your rule with this one :

rules = ( Rule(LinkExtractor(allow=('course-finder', ),restrict_xpaths=('//div[@class="pagination"]',)), callback='parse_items',follow=True), )

0人赞添加讨论(0) 举报

Scrapy Follow & Scrape next Pages

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间