INFO: Crawled 0 pages (at 0 pages/min), scraped 0

2019-08-24 05:46发布


I just began to learn Python and Scrapy. My first project is to crawl information on a website containing web security information. But when I run that using cmd, it says that "Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)" and nothing seems to come out. I'd be grateful if someone kind could solve my problem.

My code:

import scrapy

class SapoSpider(scrapy.Spider):
name = "imo"
allowed_domains = [""]
start_urls = [""]

def parse(self,response):
    subpage_links = []
    for i in response.css('div.offer-item-details'):
        youritem = {
        'preco':i.css('span.offer-item title::text').extract_first(),
        subpage_link = i.css('header[class=offer-item-header] a::attr(href)').extract()

        for subpage_link in subpage_links:
            yield scrapy.Request(subpage_link, callback=self.parse_subpage, meta={'item':youritem})

def parse_subpage(self,response):
    for j in response.css('header[class=offer-item-header] a::attr(href)'):
        youritem = response.meta.get('item')
        youritem['info'] = j.css(' ul.dotted-list, li.h4::text').extract()
        yield youritem


There are two things to correct to make it work:

  • You need to define FEED_URI setting with the path you want to store the result

  • You need to use response in parse_subpage because the logic is the following: scrapy downloads "" and gives the response toparse, you extract ads url and you ask scrapy to download each pages and give the downloaded pages toparse_subpage. Soresponseinparse_subpage` corresponds to this for example

This should work:

import scrapy

class SapoSpider(scrapy.Spider):
    name = "imo"
    allowed_domains = [""]
    start_urls = [""]
    custom_settings = {
        'FEED_URI': './output.json'
    def parse(self,response):
        subpage_links = []
        for i in response.css('div.offer-item-details'):
            youritem = {
            'preco':i.css('span.offer-item title::text').extract_first(),
            subpage_link = i.css('header[class=offer-item-header] a::attr(href)').extract()

            for subpage_link in subpage_links:
                yield scrapy.Request(subpage_link, callback=self.parse_subpage, meta={'item':youritem})

    def parse_subpage(self,response):
        youritem = response.meta.get('item')
        youritem['info'] = response.css(' ul.dotted-list, li.h4::text').extract()
        yield youritem