I am writing a web scraper using Selenium for Python. The scraper is visiting the same sites many times per hour, therefore I was hoping to find a way to alter my IP every few searches. What is the best strategy for this (I am using firefox)? Is there any prewritten code/a csv of IP addresses I can switch through? I am completely new to masking IP, proxies, etc. so please go easy on me!
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Try using a proxy. There are free options (not so reliable) or payed services.
from selenium import webdriver
def change_proxy(proxy,port):
profile = webdriver.FirefoxProfile();
profile.set_preference("network.proxy.type", 1);
profile.set_preference("network.proxy.http", proxy);
profile.set_preference("network.proxy.http_port", port);
profile.set_preference("network.proxy.ssl", proxy);
profile.set_preference("network.proxy.ssl_port", port);
driver = webdriver.Firefox(profile);
return driver
回答2:
Your ISP will assign you your IP address. If you sign up for something like hidemyass.com, they will probably provide you with an app that changes your proxy, although I don't know how they do it.
But, if they have an app that cycles you through various proxies, then all your internet traffic will go through that proxy - including your scraper. There's no need for the scraper to know about these proxies or how hide my ass works - it'll connect through the proxies just like your browser or FTP client or ....