I'm working with scrapy. I want to rotate proxies on a per request basis and get a proxy from an api I have that returns a single proxy. My plan is to make a request to the api, get a proxy, then use it to set the proxy based on :
http://stackoverflow.com/questions/4710483/scrapy-and-proxies
where I would assign it using:
request.meta['proxy'] = 'your.proxy.address';
I have the following:
class ContactSpider(Spider):
name = "contact"
def parse(self, response):
for i in range(1,3,1):
PR= Request('htp//myproxyapi.com', headers= self.headers)
newrequest= Request('htp//sitetoscrape.com', headers= self.headers)
newrequest.meta['proxy'] = PR
but I'm not sure how to use The Scrapy Request object to perform the api call. I'm Not getting a response to the PR request while debugging. Do I need to do this in a separate function and use a yield statement or is my approach wrong?
Yes. Scrapy uses a callback model. You would need to:
PR
objects back to the scrapy engine.newrequest
.A quick example:
See also: http://doc.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-callback-arguments