How do you utilize proxy support with the python web-scraping framework Scrapy?
相关问题
- how to define constructor for Python's new Nam
- streaming md5sum of contents of a large remote tar
- How to get the background from multiple images by
- Evil ctypes hack in python
- Correctly parse PDF paragraphs with Python
that would be:
In Windows I put together a couple of previous answers and it worked. I simply did:
and then I launched my program:
where "dmzo" is the program name (I'm writing it because it's the one you find in a tutorial on internet, and if you're here you have probably started from the tutorial).
As I've had trouble by setting the environment in /etc/environment, here is what I've put in my spider (Python):
Single Proxy
Enable
HttpProxyMiddleware
in yoursettings.py
, like this:pass proxy to request via
request.meta
:You also can choose a proxy address randomly if you have an address pool. Like this:
Multiple Proxies
There is nice middleware written by someone [1]: https://github.com/aivarsk/scrapy-proxies "Scrapy proxy middleware"
From the Scrapy FAQ,
The easiest way to use a proxy is to set the environment variable
http_proxy
. How this is done depends on your shell.if you want to use https proxy and visited https web,to set the environment variable
http_proxy
you should follow below,