I'm trying to download a txt file which you can find here. Downloading the file is not a problem:
testfile = urllib.URLopener()
testfile.retrieve(_proxy_list_download_, "proxies.txt")
But the problem is that when it is downloaded it acts weird. When I open it in any txt editor, I can see the content and IP addresses but when I try to print the content into the console it prints this:
212.3.183.210:8080; 0; 0; anonymous proxy; Italy; ; a; in); an Jose); ree download proxy IP
And when I try to get IP addresses from there, there is no address in the output.
with open('proxies.txt') as f:
content = f.read()
ip = re.findall( r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", content )
I've tried already another regex:
r'([0-9]+)(?:\.[0-9]+){3}'
This regex returned only 3-digit numbers.
Do you have any idea how to parse those IPs?
EDIT: Here is the copy+pasted text from text editor but in the editor everything is in one line:
# http://proxy-ip-list.com/ provides you this fresh txt proxy list to free download proxy IP
# Date: Sat, 27 Jun 2015 12:53:02 +0000
39.166.95.9:8123; 0; 0; high-anonymous; China;
178.189.92.118:3129; 16.83; 405; high-anonymous; Austria;
198.2.202.33:8090; 8.05; 884; anonymous; United States (CA, San Jose);
171.96.152.89:8080; 0; 0; anonymous; Thailand;
153.149.104.76:80; 0; 0; anonymous; Japan (Tokyo);
106.187.52.191:80; 0; 0; anonymous proxy; Japan;
194.187.214.204:80; 0.91; 6374; anonymous proxy; Finland;
59.78.160.247:8080; 0; 0; anonymous; China (Shanghai);
61.156.3.166:80; 1.12; 1449; anonymous proxy; China (Jinan);
221.238.140.164:8080; 1.39; 257; anonymous; China (Tianjin);
117.178.157.107:8123; 8.44; 847; high-anonymous; China;
39.166.205.95:8123; 0; 0; high-anonymous; China;
117.163.216.8:8123; 4.21; 1577; high-anonymous; China;
189.31.143.250:3128; 0; 0; high-anonymous; Brazil;
183.89.84.82:8080; 0; 0; anonymous proxy; Thailand;
183.88.41.42:8080; 0; 0; anonymous; Thailand;
212.3.183.210:8080; 0; 0; anonymous proxy; Italy;