I am getting stuck on a weird case of pagination. I am scraping search results from https://cotthosting.com/NYRocklandExternal/LandRecords/protected/SrchQuickName.aspx
I have search results that fall into 4 categories.
1) There are no search results
2) There is one results page
3) There is more than one results page but less than 12 results pages
4) There are more than 12 results pages.
For case 1, that is easy, I am just passing.
results = driver.find_element_by_class_name('GridView')
if len(results)== 0:
pass
For cases 2 and 3, I am checking if the list of links in the containing element is at least one and then click it.
else:
results_table = bsObj.find('table', {'class':'GridView'})
sub_tables = results_table.find_all('table')
next_page_links = sub_tables[1].find_all('a')
if len(next_page_links) == 0
scrapeResults()
else:
scrapeResults()
####GO TO NEXT PAGE UNTIL THERE IS NO NEXT PAGE
Question for case 2 and 3: What could i possibly check for here as my control?
The links are hrefs to pages 2, 3, etc. But the tricky part is if I am on a current page, say page 1, how do I make sure I a going to page 2 and when I am on page 2 how do i make sure I am going to page 3? The html for page 1 for the results list is as follows
<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;">
<tr>
<td>Page: <span>1</span></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$2')">2</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$3')">3</a></td>
</tr>
</table>
I can zero into this table specifically using sub_tables[1]
see above bs4 code in case 2.
The problem is there is no next button that I could utilize. Nothing changes along the results pages in the html. There is nothing to isolate the current page besides the number in the span
right before the links. And I would like it to stop when it reaches the last page
For case 4, the html looks like this:
<table cellspacing="0" cellpadding="0" border="0" style="border-collapse:collapse;">
<tr>
<td>Page: <span>1</span></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$2')">2</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$3')">3</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$4')">4</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$5')">5</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$6')">6</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$7')">7</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$8')">8</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$9')">9</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$10')">10</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$11')">...</a></td>
<td><a href="javascript:__doPostBack('ctl00$cphMain$lrrgResults$cgvNamesDir','Page$Last')">Last</a></td>
</tr>
</table>
The last two links are ...
to show that there are more results pages and Last
to signify the last page. However, the `Last link exists on every page and it is only on the last page itself that it is not an active link.
Question for case 4, how could i check if the last
link is clickable and use this as my stopping point?
Bigger question for case 4, how do i manouver the ...
to go through other results pages? The results page list is 12 values at most. i.e. the nearest ten pages to the current page, the ...
link to more pages and the Last
link. So i don't know what to do if my results have say 88 pages.
I am link a dump to a full sample page : https://ghostbin.com/paste/nrb27