How to send another request and get result in scra

I'm analyzing an HTML page which has a two level menu. When the top-level menu changed, there's an AJAX request sent to get second-level menu item. When the top and second menu are both selected, then refresh the content.

What I need is sending another request and get the submenu response in the scrapy's parse function. So I can iterate the submenu, build scrapy.Request per submenu item.

The pseudo code like this:

def parse(self, response):
    top_level_menu = response.xpath('//TOP_LEVEL_MENU_XPATH')
    second_level_menu_items = ## HERE I NEED TO SEND A REQUEST AND GET RESULT, PARSED TO ITME VALUE LIST

    for second_menu_item in second_level_menu_items:
        yield scrapy.Request(response.urljoin(content_request_url + '？top_level=' + top_level_menu + '&second_level_menu=' + second_menu_item), callback=self.parse_content)

How can I do this?

Using requests lib directly? Or some other feature provided by scrapy?

标签： python scrapy web-crawler

2条回答

小情绪 Triste *

2楼-- · 2019-07-23 11:52

Simply use dont_filter=True for your Request example:

def start_requests(self):
    return [Request(url=self.base_url, callback=self.parse_city)]

def parse_city(self, response):
    for next_page in response.css('a.category'):
        url = self.base_url + next_page.attrib['href']
        self.log(url)
        yield Request(url=url,  callback=self.parse_something_else, dont_filter=True)

def parse_something_else(self, response):
    for next_page in response.css('#contentwrapper > div > div > div.component > table > tbody > tr:nth-child(2) > td > form > table > tbody > tr'):
        url = self.base_url + next_page.attrib['href']
        self.log(url)
        yield Request(url=next_page, callback=self.parse, dont_filter=True)

def parse(self, response):
    pass

0人赞添加讨论(0) 举报

趁早两清

3楼-- · 2019-07-23 11:56

The recommended approach here is to create another callback (parse_second_level_menus?) to handle the response for the second level menu items and in there, create the requests to the content pages.

Also, you can use the request.meta attribute to pass data between callback methods (more info here).

It could be something along these lines:

def parse(self, response):
    top_level_menu = response.xpath('//TOP_LEVEL_MENU_XPATH').get()
    yield scrapy.Request(
        some_url,
        callback=self.parse_second_level_menus,
        # pass the top_level_menu value to the other callback
        meta={'top_menu': top_level_menu},
    )

def parse_second_level_menus(self, response):
    # read the data passed in the meta by the first callback
    top_level_menu = response.meta.get('top_menu')
    second_level_menu_items = response.xpath('...').getall()

    for second_menu_item in second_level_menu_items:
        url = response.urljoin(content_request_url + '？top_level=' + top_level_menu + '&second_level_menu=' + second_menu_item)
        yield scrapy.Request(
            url,
            callback=self.parse_content
    )

def parse_content(self, response):
    ...

Yet another approach (less recommended in this case) would be using this library: https://github.com/rmax/scrapy-inline-requests

0人赞添加讨论(0) 举报

How to send another request and get result in scra

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间