set proxy To hide my IP address for scraping the w

2020-05-29 15:49发布

问题:

I am using scrapy to crawl website now I need to set proxy handle the request which has been sent. Can anyone help me solve this set proxy in scrapy app. Please give any sample link too if you have so. And I need solution that from which IP this request is going.

回答1:

You can do it through the code below found here:

1 – Create a new file called middlewares.py and save it in your scrapy project and add the following code to it.

# Importing base64 library because we'll need it ONLY
#in case if the proxy we are going to use requires authentication
import base64

# Start your middleware class
class ProxyMiddleware(object):
  # overwrite process request
  def process_request(self, request, spider):
    # Set the location of the proxy
    request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"

    # Use the following lines if your proxy requires authentication
    proxy_user_pass = "USERNAME:PASSWORD"
    # setup basic authentication for the proxy
    encoded_user_pass = base64.encodestring(proxy_user_pass)
    request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass

2 – Open your project’s configuration file (./project_name/settings.py) and add the following code

DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
'project_name.middlewares.ProxyMiddleware': 100,
}

Also, you can use multiple proxies with scrapy. More information can be found here.