making request to API from within scrapy function

2019-06-10 13:07发布

I'm working with scrapy. I want to rotate proxies on a per request basis and get a proxy from an api I have that returns a single proxy. My plan is to make a request to the api, get a proxy, then use it to set the proxy based on :

http://stackoverflow.com/questions/4710483/scrapy-and-proxies 

where I would assign it using:

request.meta['proxy'] = 'your.proxy.address'; 

I have the following:

class ContactSpider(Spider):
    name = "contact"

def parse(self, response):

    for i in range(1,3,1):
        PR= Request('htp//myproxyapi.com',  headers= self.headers)
        newrequest= Request('htp//sitetoscrape.com',  headers= self.headers)
        newrequest.meta['proxy'] = PR 

but I'm not sure how to use The Scrapy Request object to perform the api call. I'm Not getting a response to the PR request while debugging. Do I need to do this in a separate function and use a yield statement or is my approach wrong?

标签: proxy scrapy
1条回答
冷血范
2楼-- · 2019-06-10 13:27

Do I need to do this in a separate function and use a yield statement or is my approach wrong?

Yes. Scrapy uses a callback model. You would need to:

  1. Yield the PR objects back to the scrapy engine.
  2. Parse the response of PR, and in its callback, yield newrequest.

A quick example:

def parse(self, response):
    for i in range(1,3,1):
        PR = Request(
            'http://myproxyapi.com', 
            headers=self.headers,
            meta={'newrequest': Request('htp//sitetoscrape.com',  headers=self.headers),},
            callback=self.parse_PR
        )
        yield PR

def parse_PR(self, response):
    newrequest = response.meta['newrequest']
    proxy_data = get_data_from_response(PR)
    newrequest.meta['proxy'] = proxy_data
    yield newrequest

See also: http://doc.scrapy.org/en/latest/topics/request-response.html#topics-request-response-ref-request-callback-arguments

查看更多
登录 后发表回答