可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm trying to do some scraping, but I get blocked every 4 requests. I have tried to change proxies but the error is the same. What should I do to change it properly?

Here is some code where I try it. First I get proxies from a free web. Then I go do the request with the new proxy but it doesn't work because I get blocked.

from fake_useragent import UserAgent
import requests

def get_player(id,proxy):
    ua=UserAgent()
    headers = {'User-Agent':ua.random}

    url='https://www.transfermarkt.es/jadon-sancho/profil/spieler/'+str(id)

    try:
        print(proxy)
        r=requests.get(u,headers=headers,proxies=proxy)
    execpt:

....
code to manage the data
....

Getting proxies

def get_proxies():
    ua=UserAgent()
    headers = {'User-Agent':ua.random}
    url='https://free-proxy-list.net/'

    r=requests.get(url,headers=headers)
    page = BeautifulSoup(r.text, 'html.parser')

    proxies=[]

    for proxy in page.find_all('tr'):
        i=ip=port=0

    for data in proxy.find_all('td'):
        if i==0:
            ip=data.get_text()
        if i==1:
            port=data.get_text()
        i+=1

    if ip!=0 and port!=0:
        proxies+=[{'http':'http://'+ip+':'+port}]

return proxies

Calling functions

proxies=get_proxies()
for i in range(1,100):
    player=get_player(i,proxies[i//4])

....
code to manage the data  
....

I know that proxies scrape is well because when i print then I see something like: {'http': 'http://88.12.48.61:42365'} I would like to don't get blocked.

回答1:

The problem with using free proxies from sites like this is

websites know about these and may block just because you're using one of them
you don't know that other people haven't gotten them blacklisted by doing bad things with them
the site is likely using some form of other identifier to track you across proxies based on other characteristics (device fingerprinting, proxy-piercing, etc)

Unfortunately, there's not a lot you can do other than be more sophisticated (distribute across multiple devices, use VPN/TOR, etc) and risk your IP being blocked for attempting DDOS-like traffic or, preferably, see if the site has an API for access

回答2:

import requests
from itertools import cycle

list_proxy = ['socks5://Username:Password@IP1:20000',
              'socks5://Username:Password@IP2:20000',
              'socks5://Username:Password@IP3:20000',
               'socks5://Username:Password@IP4:20000',
              ]

proxy_cycle = cycle(list_proxy)
# Prime the pump
proxy = next(proxy_cycle)

for i in range(1, 10):
    proxy = next(proxy_cycle)
    print(proxy)
    proxies = {
      "http": proxy,
      "https":proxy
    }
    r = requests.get(url='https://ident.me/', proxies=proxies)
    print(r.text)