Unable to let my script keep trying for few times

2019-08-18 04:20发布

问题:

I've created a script in python to fetch the title of certain posts from different links of a webpage. The thing is the webpage I'm trying to play with sometimes doesn't provide me with valid response, but I do get a valid response when I try it twice or thrice.

I've been trying to create a loop in such a way so that the script will check whether my defined title is nothing. If the title is nothing then the script will keep looping 4 times to see If it can succeed. However, after fourth try of each link the script will go for another link to repeat the same until all the links are exhausted.

This my my attempt so far:

import time
import requests
from bs4 import BeautifulSoup

links = [
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
    ]
counter = 0

def fetch_data(link):
    global counter
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    try:
        title = soup.select_one("p.tcode").text
    except AttributeError: title = ""

    if not title:
        while counter<=4:
            time.sleep(1)
            print("trying {} times".format(counter))
            counter += 1
            fetch_data(link)
    else:
        counter = 0

    print("tried with this link:",link)

if __name__ == '__main__':
    for link in links:
        fetch_data(link)

This is the output I can see in the console at this moment:

trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4

My expected output:

trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
trying 0 times
trying 1 times
trying 2 times
trying 3 times
trying 4 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4

PS I used wrong selector within my script so that I can let it meet the condition I've defined above.

How can I let my script keep trying with every link for few times when a condition is not met

回答1:

I think re-arrange your code as follows.

import time
import requests
from bs4 import BeautifulSoup
​
links = [
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
    "https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
    ]

def fetch_data(link):
    global counter
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    try:
        title = soup.select_one("p.tcode").text
    except AttributeError: title = ""
​
    if not title:
        while counter<=4:
            time.sleep(1)
            print("trying {} times".format(counter))
            counter += 1
            fetch_data(link)   
​
if __name__ == '__main__':
    for link in links:
        counter = 0
        fetch_data(link)
        print("tried with this link:",link)