I want to scrape all the data of a page implemented by a infinite scroll. The following python code works.
for i in range(100):
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(5)
This means every time I scroll down to the bottom, I need to wait 5 seconds, which is generally enough for the page to finish loading the newly generated contents. But, this may not be time efficient. The page may finish loading the new contents within 5 seconds. How can I detect whether the page finished loading the new contents every time I scroll down? If I can detect this, I can scroll down again to see more contents once I know the page finished loading. This is more time efficient.
The
webdriver
will wait for a page to load by default via.get()
method.As you may be looking for some specific element as @user227215 said, you should use
WebDriverWait
to wait for an element located in your page:I have used it for checking alerts. You can use any other type methods to find the locator.
EDIT 1:
I should mention that the
webdriver
will wait for a page to load by default. It does not wait for loading inside frames or for ajax requests. It means when you use.get('url')
, your browser will wait until the page is completely loaded and then go to the next command in the code. But when you are posting an ajax request,webdriver
does not wait and it's your responsibility to wait an appropriate amount of time for the page or a part of page to load; so there is a module namedexpected_conditions
.Here I did it using a rather simple form:
How about putting WebDriverWait in While loop and catching the exceptions.
Find below 3 methods:
readyState
Checking page readyState (not reliable):
id
Comparing new page ids with the old one:
staleness_of
Using
staleness_of
method:For more details, check Harry's blog.
From selenium/webdriver/support/wait.py
On a side note, instead of scrolling down 100 times, you can check if there are no more modifications to the DOM (we are in the case of the bottom of the page being AJAX lazy-loaded)