I am using scrapy to crawl website now I need to set proxy handle the request which has been sent. Can anyone help me solve this set proxy in scrapy app. Please give any sample link too if you have so. And I need solution that from which IP this request is going.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
You can do it through the code below found here:
1 – Create a new file called middlewares.py
and save it in your scrapy project and add the following code to it.
# Importing base64 library because we'll need it ONLY
#in case if the proxy we are going to use requires authentication
import base64
# Start your middleware class
class ProxyMiddleware(object):
# overwrite process request
def process_request(self, request, spider):
# Set the location of the proxy
request.meta['proxy'] = "http://YOUR_PROXY_IP:PORT"
# Use the following lines if your proxy requires authentication
proxy_user_pass = "USERNAME:PASSWORD"
# setup basic authentication for the proxy
encoded_user_pass = base64.encodestring(proxy_user_pass)
request.headers['Proxy-Authorization'] = 'Basic ' + encoded_user_pass
2 – Open your project’s configuration file (./project_name/settings.py) and add the following code
DOWNLOADER_MIDDLEWARES = {
'scrapy.contrib.downloadermiddleware.httpproxy.HttpProxyMiddleware': 110,
'project_name.middlewares.ProxyMiddleware': 100,
}
Also, you can use multiple proxies with scrapy
. More information can
be found here.
标签:
web-scraping