I have a crawler that looks something like this:
def parse:
.......
........
Yield(Request(url=nextUrl,callback=self.parse2))
def parse2:
.......
........
Yield(Request(url=nextUrl,callback=self.parse3))
def parse3:
.......
........
I want to add a rule wherein I want to ignore if a URL has crawled while invoking function parse2, but keep the rule for parse3. I am still exploring the requests.seen file to see if I can manipulate that.
check out dont_filter request parameter at http://doc.scrapy.org/en/latest/topics/request-response.html
You can set the rule in settings.py. Refer to the doc dupefilter-class