Web Scrapping List Index Out Of Range

2019-08-25 07:01发布

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = "https://www.amazon.in/s/ref=sr_nr_p_36_4?fst=as%3Aoff&rh=n%3A976419031%2Cn%3A1389401031%2Cn%3A1389432031%2Ck%3Amobile%2Cp_36%3A1318507031&keywords=mobile&ie=UTF8&qid=1543902909&rnid=1318502031"
uClient = uReq(my_url)
raw_html= uClient.read()
uClient.close()

page_soup = soup(raw_html, "html.parser")
containers = page_soup.findAll("div",{"class":"s-item-container"})

filename = "Product.csv"
f = open (filename , "w")

headers = "Name,Price,Prime \n"
f.write(headers)

for container in containers:

    title_container = container.findAll("div",{"class":"a-row a-spacing-mini"})
    product_name = title_container[0].div.a.h2.text

    price = container.findAll("span",{"class":"a-size-small a-color-secondary a-text-strike"})
    product_price = price[0].text.strip()

    prime = container.findAll("i",{"class":"a-icon a-icon-prime a-icon-small s-align-text-bottom"})
    product_prime = prime[0].text

    print("product_name : " + product_name)
    print("product_price : " + product_price)
    print("product_prime : " + product_prime)

    f.write(product_name + "," + product_price + "," + product_prime + "\n") 
f.close

I wrote my first web scrapping code but for some reason it only looped for 4 times and showed a error msg that (File "firstwebscrapping.py", line 23, in product_price = price[0].text.strip() IndexError: list index out of range). Please, can someone explain where I've done wrong?

2条回答
仙女界的扛把子
2楼-- · 2019-08-25 07:44

The first problem is not every item have the original price and current price, so you can modify this code.

From "class":"a-size-small a-color-secondary a-text-strike"

To "class":"a-size-base a-color-price s-price a-text-bold"

And another issue will raise from this code

containers = target[0].findAll("div",{"class":"s-item-container"})

s-item-container not only in ajaxData but also in atfResults, so we use the select function to get the target div list use this code target = page_soup.select('div#atfResults'), hope this can solve your question.

div#search-main-wrapper> div#ajaxData> s-item-container div#search-main-wrapper> div#atfResults> s-item-container

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = "https://www.amazon.in/s/ref=sr_nr_p_36_4?fst=as%3Aoff&rh=n%3A976419031%2Cn%3A1389401031%2Cn%3A1389432031%2Ck%3Amobile%2Cp_36%3A1318507031&keywords=mobile&ie=UTF8&qid=1543902909&rnid=1318502031"
uClient = uReq(my_url)
raw_html= uClient.read()
uClient.close()

page_soup = soup(raw_html, "html.parser")

target = page_soup.select('div#atfResults')
containers = target[0].findAll("div",{"class":"s-item-container"})

filename = "Product.csv"
f = open (filename , "w")

headers = "Name,Price,Prime \n"
f.write(headers)
print(len(containers))
for container in containers:

    title_container = container.findAll("div",{"class":"a-row a-spacing-mini"})
    product_name = title_container[0].div.a.h2.text

    price = container.findAll("span",{"class":"a-size-base a-color-price s-price a-text-bold"})
    product_price = price[0].text.strip()

    prime = container.findAll("i",{"class":"a-icon a-icon-prime a-icon-small s-align-text-bottom"})
    product_prime = prime[0].text

    print("product_name : " + product_name)
    print("product_price : " + product_price)
    print("product_prime : " + product_prime)

    f.write(product_name + "," + product_price + "," + product_prime + "\n") 
f.close()
查看更多
【Aperson】
3楼-- · 2019-08-25 07:48

Not every container has <span class="a-size-small a-color-secondary a-text-strike">.

So when you find those elements:

price = container.findAll("span",{"class":"a-size-small a-color-secondary a-text-strike"})

And there are no elements found - the price is an empty list. In the next line you access the fist element of price:

product_price = price[0].text.strip()

And because price is empty, you get an error IndexError: list index out of range.

For example, I have such elements on the page from link in the code:

enter image description here

You select strikethrough price, but OnePlus 6T doesn't have it. It only has <span class="a-size-base a-color-price s-price a-text-bold">.

You can check if price is empty, and if so - you can search the price in that span above.

查看更多
登录 后发表回答