I am trying to get the text for Last sold date
from this HTML:
<td class="browse-cell-date">
<span title="Last sold date">
May 2002
</span>
<button class="btn btn-previous-sales js-btn-previous-sales">
Previous sales (1) <i class="icon icon-down-open-1"/>
</button>
<div class="previous-sales-panel is-hidden">
<span style="display: block;">
Aug 1997
<span class="fright">£60,000</span>
</span>
</div>
</td>
I tried:
date = val.search(".//td[@class='browse-cell-date']").children[1]
It gave me the span I wanted but after adding .text
to it, did not returned anything.
I'd start with:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<td class="browse-cell-date">
<span title="Last sold date">
May 2002
</span>
<button class="btn btn-previous-sales js-btn-previous-sales">
Previous sales (1) <i class="icon icon-down-open-1"/>
</button>
<div class="previous-sales-panel is-hidden">
<span style="display: block;">
Aug 1997
<span class="fright">£60,000</span>
</span>
</div>
</td>
EOT
sold_date = doc.at('span[title="Last sold date"]') # => #<Nokogiri::XML::Element:0x3ffc7e84c35c name="span" attributes=[#<Nokogiri::XML::Attr:0x3ffc7e84c2f8 name="title" value="Last sold date">] children=[#<Nokogiri::XML::Text:0x3ffc7e82bc10 "\n May 2002 \n ">]>
sold_date.text # => "\n May 2002 \n "
sold_date.text.strip # => "May 2002"
So
doc.at('span[title="Last sold date"]').text.strip # => "May 2002"
will do it.
at
is like search('some selector').first
so use it for convenience. Both at
and search
are smart enough to figure out whether the selector is CSS or XPath most of the time so I use those. If Nokogiri is fooled I'll revert to using one of the *_css
or *_xpath
variants.
Alternately you could use:
doc.at('td.browse-cell-date span').text.strip # => "May 2002"
doc.at('td.browse-cell-date > span').text.strip # => "May 2002"
Note: Using text
with any of the search
, xpath
or css
methods isn't a good idea. Those methods return a NodeSet, which doesn't do what you expect when you use its text
method. Consider these examples:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<p>foo</p>
<p>bar</p>
</body>
</html>
EOT
doc.search('p').class # => Nokogiri::XML::NodeSet
doc.search('p').text # => "foobar"
We regularly see questions where people have done this and then need to figure out how to split the concatenated text into something useful, which usually is very difficult.
99.99% of the time, you want to use the following map(&:text)
to extract the text from a NodeSet:
doc.search('p').map(&:text) # => ["foo", "bar"]
But, in your use, simply use at
, which returns a Node and then text
will do what you expect.
Try this
page.search(".//td").children[1].attr("title")