I am using difflib.HtmlDiff
to compare two files. I want the differences to be highlighted in the outputted html.
This already works when there are a maximum of two different chars in one line:
a = "2.000"
b = "2.120"
But when there are more different characters on one line then in the output the whole line is marked red (on the left side) or green (on the right side of the table):
a = "2.000"
b = "2.123"
Is this behaviour configurable? So can I set the number of different characters at which the line is marked as deleted / added?
EDIT:
Example:
import difflib
diff=difflib.HtmlDiff()
print(diff.make_file(
'''
2.000
2.000
2.000
'''.splitlines(),
'''
2.001
2.010
2.011
'''.splitlines()))
Gives me this output:
Line 2 is the output I want. It highlights the differences in yellow. Line 3 is odd for me because it does not detect the one character change but instead shows it as delete / add. Line 4 same as for line 3 but the whole line is marked.
difflib's algorithm does not claim to yield minimal edit sequences. Although that statement comes from the docs for
SequenceMatcher
, I suspect it applies todifflib
in general, andHTMLDiff
in particular.While googling around for "python alternative difflib minimal edit" I found google-diff-match-patch. If you try out their demo for Diff with your example strings, it yields
Although the output is not exactly what you requested, it does show that it found the minimal edits.
The API docs state
which suggests looking at the source code for
diff_prettyHtml
might be a good starting point from which to build the HTML table you are looking for.