I am trying to scrape the hrefs of all the listings. I am fairly new to beautifulsoup and have done a bit of scraping before, but have done some scraping before. But I can't for the life of me extract. See below my code. the container has length zero when I run this script.
I try and select the price too (soup.findAll("span", {"class":"amount"}) , but it doesn't reflect. Any advice most welcome :)
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
url = 'https://www.takealot.com/computers/laptops-10130'
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
req = urllib.request.Request(url, headers=headers)
resp = urllib.request.urlopen(req)
respData = str(resp.read())
soup = BeautifulSoup(respData, 'html.parser')
container = soup.find_all("div", {"class": "p-data left"})
The page is rendered with JavaScript. There are several ways to render and scrape it.
I can scrape it with Selenium. First install Selenium:
Then get a driver https://sites.google.com/a/chromium.org/chromedriver/downloads you can use a headless version of chrome "Chrome Canary" if you are on Windows or Mac.
Alternatively use PyQt5
Alternatively use dryscrape:
Outputs in all cases:
However when testing with your URL I observed the results were not reproducible every time, occasionally I got no content in "containers" after the page had rendered.