Doing a diff of two different html documents turns out to be an entirely different problem than simply doing a diff of plain text. For example, if I do a naive LCS diff between:
Google</p>
and
Google</a></p>
the diff result is NOT:
</a>
but
/a></
I've tried most gems out there that claim to be html diff but all of them seem to be just implementing text based LCS diff. Is there any gem that does a diff while taking html tags into account?
After much searching for a gem to do this for me, I discovered that I can simply do a string compare between two parsed Nokogiri documents:
def should_match_html(html_text1, html_text2)
dom1 = Nokogiri::HTML(html_text1)
dom2 = Nokogiri::HTML(html_text2)
dom1.to_s.should == dom2.to_s
end
Then you can simply add this in your spec:
should_match_html expected_html, actual_html
The best part is that the built-in rspec matcher will automatically provide you a line-by-line diff result of the mismatched lines.
Try Samy diffy or rubygems html-diff