How do I remove all tags below a certain node except for some elements using Nokogiri? For example, using this setup:
src = <<EOS
<html>
<body>
<p>
Hello <i>world</i>!
This is <em>another</em> line.
<p><h3>And a paragraph <em>with</em> a heading.</h3></p>
<b>Third line.</b>
</p>
</body>
</html>
EOS
doc = Nokogiri::HTML(src)
para = doc.at('//p')
How can I remove all elements in the paragraph (while preserving their content) except <i> and <b> elements? So the result would be:
<html>
<body>
<p>
Hello <i>world</i>!
This is another line.
And a paragraph with a heading.
<b>Third line.</b>
</p>
</body>
</html>
Flack gave the correct answer using an XSLT template, I provide a full Nokogiri based example here:
Output:
Applied to your sample, result will be:
EDIT as requested in comments.
This will remove all elements (markup, not string-values) in
p
, except ofi
andb
elements.Just to round out the examples, here's one using Nokogiri without XSLT:
Notice that Nokogiri is not happy with the markup and did some fix-up. And, that the actual code to strip the tags was only three lines and could have been written on one.