I would like to show differences between two blocks of text. Rather than comparing lines of text or individual characters, I would like to just compare words separated by specified characters ('\n', ' ', '\t' for example). My main reasoning for this is that the block of text that I'll be comparing generally doesn't have many line breaks in it and letter comparisons can be hard to follow.
I've come across the following O(ND) logic in C# for comparing lines and characters, but I'm sort of at a loss for how to modify it to compare words.
In addition, I would like to keep track of the separators between words and make sure they're included with the diff. So if a space is replaced by a hard return, I would like that to come up as a diff.
I'm using Asp.Net (c#) to display the entire block of text including the deleted original text and added new text (both will be highlighted to show that they were deleted/added). A solution that works with those technologies would be appreciated.
Any advice for how to accomplish this is appreciated.
Well
String.Split
with '\n', ' ' and '\t' as the split characters will return you an array of words in your block of text.You could then compare each array for differences. A simple 1:1 comparison would tell you if any word had been changed. Comparing:
and:
would give you that
world
and changed tothere
.What it wouldn't tell you was if words had been inserted or removed and you would still need to parse the text blocks character by character to see if any of the separator characters had been changed.
string string1 = "hello world how are you"; string string2 = "hello there how are you";
Other than a few general optimizations, if you need to include the separators in the comparison you are essentially doing a character by character comparison with breaks. Though you could use the O(ND) you linked, you are going to make as many changes to it as you would basically writing your own.
The main problem with difference comparison is finding the continuation (if I delete a single word, but leave the rest the same).
If you want to use their code start with the example and do not write the deleted characters, if there are replaced characters in the same place, do not output this result. You then need to compute the longest continuous run of "changed" words, highlight this string and output.
Sorry thats not much of an answer, but for this problem the answer is basically writing and tuning the function.
Microsoft has released a diff project on CodePlex that allows you to do word, character, and line diffs. It is licensed under Microsoft Public License (Ms-PL).
https://github.com/mmanela/diffplex