I'm trying to create a basic web scraper for Amazon results. As I'm iterating through results, I sometimes get to page 5 (sometimes only page 2) of the results and then a StaleElementException
is thrown. When I look at the browser after the exception is thrown, I can see that the driver/page did not scroll down to where the page numbers are (bottom bar).
My code:
driver.get('https://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=sonicare+toothbrush')
for page in range(1,last_page_number +1):
driver.implicitly_wait(10)
bottom_bar = driver.find_element_by_class_name('pagnCur')
driver.execute_script("arguments[0].scrollIntoView(true);", bottom_bar)
current_page_number = int(driver.find_element_by_class_name('pagnCur').text)
if page == current_page_number:
next_page = driver.find_element_by_xpath('//div[@id="pagn"]/span[@class="pagnLink"]/a[text()="{0}"]'.format(current_page_number+1))
next_page.click()
print('page #',page,': going to next page')
else:
print('page #: ', page,'error')
I've looked at this question, and I'm guessing that a similar fix can be applied, but I'm not sure how to find something on the page that disappears. Also, based on how quickly the print statements are occurring, I can see that the implicitly_wait(10)
isn't actually waiting a full 10 seconds.
The exception is pointing to the line that starts with "driver.execute_script". This is the exception:
StaleElementReferenceException: Message: The element reference of <span class="pagnCur"> is stale; either the element is no longer attached to the DOM, it is not in the current frame context, or the document has been refreshed
Sometimes I'll get a ValueError:
ValueError: invalid literal for int() with base 10: ''
So these errors/exceptions lead me to believe that there is something going on with waiting for the page to refresh completely.
If you just want your script to iterate over all the result pages, you don't need any complicated logic - just make a click on Next button while it's possible:
P.S. Also note that
implicitly_wait(10)
should not wait full 10 seconds, but wait up to 10 seconds for element to appear in HTML DOM. So if element is found within 1 or 2 seconds then wait is done and you will not wait rest 8-9 seconds...It seems you were almost there.
Preserving your concept of scrolling through
scrollIntoView()
and printing a couple of helpful debug messages, I have made some minor adjustments inducing WebDriverWait and you can use the following solution:Code Block:
Console Output: