Few days back I created this post, to seek any solution as to how I can let my script loop in such a way so that the script will use few links to check whether my defined title
(supposed to be extracted from each link) is nothing for four
times. If the title
is still nothing then the script will break
the loop
and go for another link to repeat the same.
This is how I got success--► By changing fetch_data(link)
to return fetch_data(link)
and defining counter=0
outside while loop
but inside if
statement.
Rectified script:
import time
import requests
from bs4 import BeautifulSoup
links = [
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2",
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3",
"https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4"
]
counter = 0
def fetch_data(link):
global counter
res = requests.get(link)
soup = BeautifulSoup(res.text,"lxml")
try:
title = soup.select_one("p.tcode").text
except AttributeError: title = ""
if not title:
while counter<=3:
time.sleep(1)
print("trying {} times".format(counter))
counter += 1
return fetch_data(link) #First fix
counter=0 #Second fix
print("tried with this link:",link)
if __name__ == '__main__':
for link in links:
fetch_data(link)
This is the output the above script produces (as desired):
trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=2
trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=3
trying 0 times
trying 1 times
trying 2 times
trying 3 times
tried with this link: https://stackoverflow.com/questions/tagged/web-scraping?sort=newest&page=4
I used wrong selector within my script so that I can let it meet the condition I've defined above.
Why should I use
return fetch_data(link)
instead offetch_data(link)
as the expressions work identically most of the times?