All examples i found of Scrapy talk about how to crawl a single page, pages with the same url schema or all the pages of a website. I need to crawl series of pages A, B, C where in A you got the link to B and so on.. For example the website structure is:
A
----> B
---------> C
D
E
I need to crawl all the C pages, but to get link to C i need to crawl before A and B. Any hints?
see scrapy Request structure, to crawl such chain you'll have to use the callback parameter like the following:
Here is an example spider I wrote for a project of mine:
I think the parse method is what you are after: It looks at every link on the start_urls page, it then uses some regex to decide if it is a relevant_url (i.e. a url i would like to scrape), if it is relevant - it scrapes the page using yield Request(url, callback=self.parse_page), which calls the parse_page method.
Is this the kind of thing you are after?