So I'm trying to test some webpages in scrapy, my idea is to yield a Request to the URLS that satisfy the condition, count the number of certain items on the page, and then within the original condition return True/False depending...
Here is some code to show what i mean:
def filter_categories:
if condition:
test = yield Request(url=link, callback = self.test_page, dont_filter=True)
return (test, None)
def test_page(self, link):
... parse the response...
return True/False depending
I have tried messing around with passing an item in the request, but no matter what the return line get's triggered before test_page is ever called...
So i guess my question becomes is there any way to pass data back to the filter_categories method in a synchronous way so that i can use the result of test_page to return whether or not my test is satisfied?
Any other ideas are also welcome.
If I understood you correct: you want to
yield scrapy.Request
to URLS that will haveTrue
condition. Am I right? Here some example for it:If you give more info I'll try help more.
It's part of my code
Take a look at inline_requests package, which should let you achieve this.
Other solution is to not insist on returning the result from original method (
filter_categories
in your case), but rather use request chaining withmeta
attribute of requests and return the result from the last parse method in the chain (test_page
in your case).