Suppose I have this structure:
Is there anyway of getting to this :
using Ruby's libraries? I managed to get this using Nokogiri. From my tests, it appears to work, but maybe there's another approach, a better one.
How about one that does the whole thing in two lines?
seen =
node.traverse {|n| n.unlink if (seen[n.to_xml] += 1) > 1}
If there's a possibility of the same node appearing under two different parents, and you don't want those to be considered duplicates, you can change that second line to:
node.traverse {|n| n.unlink if (seen[(n.parent.path rescue "") + n.to_xml] += 1) > 1}
This page explains XML parsing in Ruby a little bit
This page explains some of the reasons why you want to use a proper parser over something like regular expressions:
At a glance, the approach you're using doesn't seem horrible.