I'm not sure if it's a matter of syntax or differences in versions but I can't seem to figure this out. I want to take data that is inside a (non-closing) td
from the h2
tag to the h3
tag. Here is what the HTML would look like.
<td valign="top" width="350">
<br><h2>NameIWant</h2><br>
<br>Town<br>
PhoneNumber<br>
<a href="mailto:emailIwant@nowhere.com" class="links">emailIwant@nowhere.com</a>
<br>
<a href="http://websiteIwant.com" class="links">websiteIwant.com</a>
<br><br>
<br><img src="images/spacer.gif"/><br>
<h3><b>I want to stop before this!</b></h3>
Lorem Ipsum Yadda Yadda<br>
<img src="images/spacer.gif" border="0" width="20" height="11" alt=""/><br>
<td width="25">
<img src="images/spacer.gif" border="0" width="20" height="8" alt=""/>
<td valign="top" width="200"><img src="images/spacer.gif"/>
<br>
<br>
<table cellspacing="0" cellpadding="0" border="0"/>205"><tr><td>
<a href="http://dontneedthis.com">
</a></td></tr><br>
<table border="0" cellpadding="3" cellspacing="0" width="200">
...
The <td valign>
doesn't close until the very bottom of the page which I think might be why I'm having problems.
My Ruby code looks like:
require 'open-uri'
require 'nokogiri'
@doc = Nokogiri::XML(open("http://www.url.com"))
content = @doc.css('//td[valign="top"] [width="350"]')
name = content.xpath('//h2').text
puts name // Returns NameIwant
townNumberLinks = content.search('//following::h2')
puts content // Returns <h2> NameIWant </h2>
As I understand it following syntax should "Selects everything in the document after the closing tag of the current node". If I try to use preceding
like:
townNumberLinks = content.search('//preceding::h3')
// I get: <h3><b>I want to stop before this!</b></h3>
Hope I made it clear what I'm trying to do. Thanks!