How to crawl data from the linked webpages on a we

2019-07-28 04:51发布

I am crawling the names of the colleges on this webpage, but, i also want to crawl the number of faculties in these colleges which is available if open the specific webpages of the colleges by clicking the name of the college.

What should i append to this code to get the result. The result should be in the form of [(name1, faculty1), (name2,faculty2),... ]

import scrapy
class QuotesSpider(scrapy.Spider):
    name = "student"
    start_urls = [
        'http://www.engineering.careers360.com/colleges/list-of-engineering-colleges-in-karnataka?sort_filter=alpha',
    ]

    def parse(self, response):
        for students in response.css('li.search-result'):
            yield {
                'name': students.css('div.title a::text').extract(),                   
            }

标签： python scrapy web-crawler scrapy-spider

1条回答

男人必须洒脱

2楼-- · 2019-07-28 05:09

import scrapy
class QuotesSpider(scrapy.Spider):
    name = "student"
    start_urls = [
        'http://www.engineering.careers360.com/colleges/list-of-engineering-colleges-in-karnataka?sort_filter=alpha',
    ]

    def parse(self, response):
        for students in response.css('li.search-result'):
            req = scrapy.Request(students.css(SELECT_URL), callback=self.parse_student)
            req.meta['name'] = students.css('div.title a::text').extract()
            yield req

    def parse_student(self, response):
        yield {
            'name': response.meta.get('name')
            'other data': response.css(SELECTOR)
        }

Should be something like this. So you send the name of the student in the meta data of the request. That allows you to request it in your next request.

If the data is also available on the last page you scrape in parse_student you might want to consider not sending it in the meta data but just to scrape it from the last page.

0人赞添加讨论(0) 举报

How to crawl data from the linked webpages on a we

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间