import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = "https://www.amazon.in/s/ref=sr_nr_p_36_4?fst=as%3Aoff&rh=n%3A976419031%2Cn%3A1389401031%2Cn%3A1389432031%2Ck%3Amobile%2Cp_36%3A1318507031&keywords=mobile&ie=UTF8&qid=1543902909&rnid=1318502031"
uClient = uReq(my_url)
raw_html= uClient.read()
uClient.close()
page_soup = soup(raw_html, "html.parser")
containers = page_soup.findAll("div",{"class":"s-item-container"})
filename = "Product.csv"
f = open (filename , "w")
headers = "Name,Price,Prime \n"
f.write(headers)
for container in containers:
title_container = container.findAll("div",{"class":"a-row a-spacing-mini"})
product_name = title_container[0].div.a.h2.text
price = container.findAll("span",{"class":"a-size-small a-color-secondary a-text-strike"})
product_price = price[0].text.strip()
prime = container.findAll("i",{"class":"a-icon a-icon-prime a-icon-small s-align-text-bottom"})
product_prime = prime[0].text
print("product_name : " + product_name)
print("product_price : " + product_price)
print("product_prime : " + product_prime)
f.write(product_name + "," + product_price + "," + product_prime + "\n")
f.close
I wrote my first web scrapping code but for some reason it only looped for 4 times and showed a error msg that (File "firstwebscrapping.py", line 23, in product_price = price[0].text.strip() IndexError: list index out of range). Please, can someone explain where I've done wrong?
The first problem is not every item have the original price and current price, so you can modify this code.
From
"class":"a-size-small a-color-secondary a-text-strike"
To
"class":"a-size-base a-color-price s-price a-text-bold"
And another issue will raise from this code
containers = target[0].findAll("div",{"class":"s-item-container"})
s-item-container not only in ajaxData but also in atfResults, so we use the select function to get the target div list use this code
target = page_soup.select('div#atfResults')
, hope this can solve your question.div#search-main-wrapper> div#ajaxData> s-item-container div#search-main-wrapper> div#atfResults> s-item-container
Not every
container
has<span class="a-size-small a-color-secondary a-text-strike">
.So when you find those elements:
And there are no elements found - the
price
is an empty list. In the next line you access the fist element ofprice
:And because
price
is empty, you get an errorIndexError: list index out of range
.For example, I have such elements on the page from link in the code:
You select strikethrough price, but OnePlus 6T doesn't have it. It only has
<span class="a-size-base a-color-price s-price a-text-bold">
.You can check if
price
is empty, and if so - you can search the price in thatspan
above.