I've written a script in vba in combination with selenium to parse all the company names available in a webpage. The webpage has got lazyloading method active so there are only 20 links become visible in each scroll. If I scroll 2 times then the number of links visible are 40 and so on. There are 1000 links available in that webpage. My below script can reach the bottom of that page handling all the scroll and fetch all the names available in that webpage.
However, it is necessary to wait a certain time after each scroll for that webpage to update the content. This is where I've used hardcoded delay
but the process of hardcoding thing is very inconsistent and sometimes it makes the browser quit before the completion of the whole operation.
How can I modify this portion .Wait 6000
to make it Explicit Wait
instead of Hardcoded Wait
.
This is what I've written so far:
Sub Getlinks()
Dim driver As New ChromeDriver, prevlen&, curlen&
Dim posts As Object, post As Object
With driver
.get "http://fortune.com/fortune500/list/"
prevlen = .FindElementsByClass("company-title").Count
Do
prevlen = curlen
.ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")
.Wait 6000 ''I like to kick out this hardcoded delay and use explicit wait in place
Set posts = .FindElementsByClass("company-title")
curlen = posts.Count
If prevlen = curlen Then Exit Do
Loop
For Each post In posts
R = R + 1: Cells(R, 1) = post.Text
Next post
End With
End Sub
I think you are almost there.
Although I don't think you can avoid waiting, the work around is to keep a number of times checking for new posts as you scroll down with a shorter wait.
Example below is to check for new posts 5 times each with 2 seconds wait, so a total of 10 seconds before declaring end of the page. Adjust these 2 parameters to suit.
Here is a completely different approach that doesn't require using a browser, instead it submits a series of web requests. With this approach, waiting for a page to load isn't a concern.
Typically, with lazy loading pages, it will submit a new request to load up the data for the page as you scroll. If you monitor the web traffic you can spot the requests made and emulate those, I have done that below.
The result should be a list of company names, in ascending order in whatever the first sheet of Excel is.
Things you'll need:
Add References to:
Edit
Changed the code to keep pulling data from the site, until there is no more items in the list. Thanks @Qharr for pointing this out.
Code
I don't know if this will help as it's still a 'hard-coded' solution but you could try a delay function rather than the wait function and see if that helps with the program exiting issue.
Define a timeout (specified period of time that will be allowed to elapse) to get rid of the hardcoded delay. The timeout needs to be hardcoded.
The differences between this and your original code are:
Code:
There you go:
Or even better, print as you go:
Here's a way to approach it using the "look for the spinner element" method discussed in one of the comments, which helps you avoid having to specify the number of elements you're expecting the page to load. The class name of the spinner actually changes depending on whether or not it's visible, which makes it pretty easy to just wait for the spinner to become visible + disappear again before getting the page elements.
This method still involves some waiting; by default, it waits 1/10th of a second after each attempt to find the spinner, either until the spinner is found or for some maximum number of attempts. But that's much faster than waiting 5 seconds every time.
Also, unrelated, but don't write stuff to cells one at a time, it's really slow. It's much faster to write it to an array first + write the entire array at once.