I'd like to convert the output of diff
(on a Markdown file) to
Markdown with <strike>
and <em>
tags, so that I can see what has
been removed from or added to a new version of a document. (This kind of
treatment is very common for legal documents.)
Example of hoped-for output:
Why do weWe study programming languages? notNot in order to ...
One of the many
difficulties is that diff's output is line-oriented, where I want to
see differences in individual words. Does anyone have suggestions as
to what algorithm to use, or what software to build on?
Use wdiff. It already does the word-by-word comparison you're looking for; converting its output to markdown should take just a few simple regular expressions.
For example:
$ cat foo
Why do we study programming languages? Not in order to
$ cat bar
We study programming languages not in order to
$ wdiff foo bar
[-Why do we-]{+We+} study programming [-languages? Not-] {+languages not+} in order to
$ wdiff foo bar | sed 's|\[-|<em>|g;s|-]|</em>|g;s|{+|<strike>|g;s|+}|</strike>|g'
<em>Why do we</em><strike>We</strike> study programming <em>languages? Not</em> <strike>languages not</strike> in order to
Edit: Actually, wdiff has some options that make it even easier:
$ wdiff -w '<em>' -x '</em>' -y '<strike>' -z '</strike>' foo bar
<em>Why do we</em><strike>We</strike> study programming <em>languages? Not</em> <strike>languages not</strike> in order to
Use Markdown-Diff to have the word diff annotated to your original document. It formats wdiff
or git --word-diff
's output in Markdown, so you can use your favorite Markdown previewer or compiler to review changes. (Markdown-Diff was written by myself, inspired by Adam Rosenfield's answer.)
You didnt specify the target platform, but assuming if you are using .NET you should definitely check out this article on CodeProject
http://www.codeproject.com/KB/recipes/diffengine.aspx
The diff engine performs comparison and return you the logical object which can apply your own visual display formatting to it. I have used it in several projects one of which was a web based text comparison and we were able to introduced all those markup like you wanted above. I have also extend the engine with new classes to do custom line type comparisons.