I am new to nokogiri and so far most familiar with CSS selectors, I am trying to parse information from a table, below is a sample of the table and the code I'm using, I'm stuck on the appropriate if statement, as it seems to return the whole contents of the table.
Table:
<div class="holder">
<div class ="row">
<div class="c1">
<!-- Content I Don't need -->
</div>
<div class="c2">
<span class="data">
<!-- Content I Don't Need -->
<span class="data">
</div>
</div>
...
<div class="row">
<div class="c1">
SPECIFIC TEXT
</div>
<div class="c2">
<span class="data">
What I want
</span>
</div>
</div>
</div>
My Script: (if SPECIFIC TEXT is found in the table it returns every "div.c2 span.data" variable - so I've either screwed up my knowledge of do loops or if statements)
data = []
page.agent.get(url)
page.search('div.row').each do |row_data|
if (row_data.search('div.c1:contains("/SPECIFIC TEXT/")').text.strip
temp = row_data.search('div.c2 span.data').text.strip
data << temp
end
end
I'd do
How those selectors would work here -
[name*="value"]
- Selects elements that have the specified attribute with a value containing the a given substring.Child Selector (“parent > child”)
- Selects all direct child elements specified by "child" of elements specified by "parent".Next Adjacent Selector (“prev + next”)
- Selects all next elements matching "next" that are immediately preceded by a sibling "prev".Class Selector (“.class”)
- Selects all elements with the given class.Descendant Selector (“ancestor descendant”)
- Selects all elements that are descendants of a given ancestor.There's no need to stop and insert ruby logic when you can extract what you need in a single CSS selector.
This will include only those that match the selector (e.g. follow the SPECIFIC TEXT).
Here's where your logic may have gone wrong:
This code
first searches the row for the specific text, then if it matches, returns ALL rows matching the second query, which has the same starting point. The key is the
+
in the CSS selector above which will return elements immediately following (e.g. the next sibling element). I'm making an assumption, of course, that the next element is always what you want.