I'm analyzing an HTML page which has a two level menu. When the top-level menu changed, there's an AJAX request sent to get second-level menu item. When the top and second menu are both selected, then refresh the content.
What I need is sending another request and get the submenu response in the scrapy's parse
function. So I can iterate the submenu, build scrapy.Request
per submenu item.
The pseudo code like this:
def parse(self, response):
top_level_menu = response.xpath('//TOP_LEVEL_MENU_XPATH')
second_level_menu_items = ## HERE I NEED TO SEND A REQUEST AND GET RESULT, PARSED TO ITME VALUE LIST
for second_menu_item in second_level_menu_items:
yield scrapy.Request(response.urljoin(content_request_url + '?top_level=' + top_level_menu + '&second_level_menu=' + second_menu_item), callback=self.parse_content)
How can I do this?
Using requests
lib directly? Or some other feature provided by scrapy?
Simply use dont_filter=True for your Request example:
The recommended approach here is to create another callback (
parse_second_level_menus
?) to handle the response for the second level menu items and in there, create the requests to the content pages.Also, you can use the
request.meta
attribute to pass data between callback methods (more info here).It could be something along these lines:
Yet another approach (less recommended in this case) would be using this library: https://github.com/rmax/scrapy-inline-requests