I'm using Mechanize to scrape Google Wallet for Order data. I am capturing all the data from the first page, however, I need to automatically link to subsequent pages to get more info.
The #purchaseOrderPager-pagerNextButton will move to the next page so I can pick up more records to capture. The element looks like this. I need to click on it to keep going.
<a id="purchaseOrderPager-pagerNextButton" class="kd-button small right"
href="purchaseorderlist?startTime=0&...
;currentPageStart=1&currentPageEnd=25&inputFullText=">
<img src="https://www.gstatic.com/mc3/purchaseorder/page-right.png"></a>
However, when I try the following I get an error:
next_page = @orders_page.search("#purchaseOrderPager-pagerNextButton")
next_page.click
The error:
undefined method `click' for #<Nokogiri::XML::NodeSet:0x007f9019095550> (NoMethodError)
click
is a method of Mechanize
class.
Try following form.
next_page = @orders_page.at("#purchaseOrderPager-pagerNextButton")
mechanize_instance.click(next_page)
NOTE Replace mechanize_instance
with actual variable.
Your one doesn't work, as #search
gives Nokogiri::XML::NodeSet
instance. NodeSet is a collection of nodes. But in your case it is next_page
is a NodeSet collection, which holds only one element. And first
will give you the Nokogiri::XML::Node
, which is also an Nokogiri::XML::Element
.
Write as below :
next_page = @orders_page.search("#purchaseOrderPager-pagerNextButton").first
Or better to use #at
method.
next_page = @orders_page.at("#purchaseOrderPager-pagerNextButton")
Now #click
is a method of Mechanize::Page::Link
instance. Open the source :
# File lib/mechanize/page/link.rb, line 29
def click
@mech.click self
end
Here is the full code :-
next_page = @orders_page.at("#purchaseOrderPager-pagerNextButton")
# mech is your Mechanize object.
next_link = Mechanize::Page::Link.new( next_page, mech, @orders_page )
next_link.click
Mechanize#click
lets you supply a string with the text of the anchor/ button to click on and Nokogiri::XML::Node
as well. So we can do :
mech.click next_page
Let's see why the above code would work :
source code lines
referer = current_page()
href = link.respond_to?(:href) ? link.href :
(link['href'] || link['src'])
get href, [], referer