Suppose I have this structure:
<one>
<two>
<three>3</three>
</two>
<two>
<three>4</three>
</two>
<two>
<three>3</three>
</two>
</one>
Is there anyway of getting to this :
<one>
<two>
<three>3</three>
</two>
<two>
<three>4</three>
</two>
</one>
using Ruby's libraries? I managed to get this using Nokogiri. From my tests, it appears to work, but maybe there's another approach, a better one.
How about one that does the whole thing in two lines?
seen = Hash.new(0)
node.traverse {|n| n.unlink if (seen[n.to_xml] += 1) > 1}
If there's a possibility of the same node appearing under two different parents, and you don't want those to be considered duplicates, you can change that second line to:
node.traverse {|n| n.unlink if (seen[(n.parent.path rescue "") + n.to_xml] += 1) > 1}
This page explains XML parsing in Ruby a little bit http://developer.yahoo.com/ruby/ruby-xml.html
This page explains some of the reasons why you want to use a proper parser over something like regular expressions:
http://htmlparsing.icenine.ca
At a glance, the approach you're using doesn't seem horrible.