I want to use BeautifulSoup and retrieve specific URLs at specific position repeatedly. You may imagine that there are 4 different URL lists each containing 100 different URL links.
I need to get and print always the 3rd URL on every list, while the previous URL (e.g. the 3rd URL on the first list) will lead to the 2nd list (and then need to get and print the 3rd URL and so on till the 4th retrieval).
Yet, my loop only achieves the first result (3rd URL on list 1), and I don't know how to loop the new URL back to the while loop and continue the process.
Here is my code:
import urllib.request
import json
import ssl
from bs4 import BeautifulSoup
num=int(input('enter count times: ' ))
position=int(input('enter position: ' ))
url='https://pr4e.dr-chuck.com/tsugi/mod/python-
data/data/known_by_Fikret.html'
print (url)
count=0
order=0
while count<num:
context = ssl._create_unverified_context()
htm=urllib.request.urlopen(url, context=context).read()
soup=BeautifulSoup(htm)
for i in soup.find_all('a'):
order+=1
if order ==position:
x=i.get('href')
print (x)
count+=1
url=x
print ('done')
Just get the link from
find_all()
by index:This is a good problem to use recursion. Try to call a recursive function to do this:
and then call it with the desired parameters, in your case:
2 is the url index on the list of urls, 0 is the counter, and 4 is how deep you want to go recursively
ps: I am using requests instead of urllib, and I didnt test this, although I recentely used a very similar function with sucess