What's the smartest way to have Nokogiri select all content between the start and the stop element (including start-/stop-element)?
Check example code below to understand what I'm looking for:
require 'rubygems'
require 'nokogiri'
value = Nokogiri::HTML.parse(<<-HTML_END)
"<html>
<body>
<p id='para-1'>A</p>
<div class='block' id='X1'>
<p class="this">Foo</p>
<p id='para-2'>B</p>
</div>
<p id='para-3'>C</p>
<p class="that">Bar</p>
<p id='para-4'>D</p>
<p id='para-5'>E</p>
<div class='block' id='X2'>
<p id='para-6'>F</p>
</div>
<p id='para-7'>F</p>
<p id='para-8'>G</p>
</body>
</html>"
HTML_END
parent = value.css('body').first
# START element
@start_element = parent.at('p#para-3')
# STOP element
@end_element = parent.at('p#para-7')
The result (return value) should look like this:
<p id='para-3'>C</p>
<p class="that">Bar</p>
<p id='para-4'>D</p>
<p id='para-5'>E</p>
<div class='block' id='X2'>
<p id='para-6'>F</p>
</div>
<p id='para-7'>F</p>
Update: This is my current solution, though I think there must be something smarter:
@my_content = ""
@selected_node = true
def collect_content(_start)
if _start == @end_element
@my_content << _start.to_html
@selected_node = false
end
if @selected_node == true
@my_content << _start.to_html
collect_content(_start.next)
end
end
collect_content(@start_element)
puts @my_content
For the sake of completeness a XPath only solution :)
It builds an intersection of two sets, the following siblings of the start element and the preceding siblings of the end element.
A little divided on variables for readability:
Siblings don't include the element itself, so the start and end elements must be included in the expression separately.
Edit: Easier solution:
A way-too-smart oneliner which uses recursion:
An iterative solution:
EDIT: (Short) explanation of the asterix
It's called the splat operator. It "unrolls" an array: