Generate pretty diff html in Python

2019-01-16 07:40发布

问题:

I have two chunks of text that I would like to compare and see which words/lines have been added/removed/modified in Python (similar to a Wiki's Diff Output).

I have tried difflib.HtmlDiff but it's output is less than pretty.

Is there a way in Python (or external library) that would generate clean looking HTML of the diff of two sets of text chunks? (not just line level, but also word/character modifications within a line)

回答1:

There's diff_prettyHtml() in the diff-match-patch library from Google.



回答2:

Generally, if you want some HTML to render in a prettier way, you do it by adding CSS.

For instance, if you generate the HTML like this:

import difflib
import sys

fromfile = "xxx"
tofile = "zzz"
fromlines = open(fromfile, 'U').readlines()
tolines = open(tofile, 'U').readlines()

diff = difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile)

sys.stdout.writelines(diff)

then you get green backgrounds on added lines, yellow on changed lines and red on deleted. If I were doing this I would take take the generated HTML, extract the body, and prefix it with my own handwritten block of HTML with lots of CSS to make it look good. I'd also probably strip out the legend table and move it to the top or put it in a div so that CSS can do that.

Actually, I would give serious consideration to just fixing up the difflib module (which is written in python) to generate better HTML and contribute it back to the project. If you have a CSS expert to help you or are one yourself, please consider doing this.



回答3:

I recently posted a python script that does just this: diff2HtmlCompare (follow the link for a screenshot). Under the hood it wraps difflib and uses pygments for syntax highlighting.



回答4:

try first of all clean up both of HTML by lxml.html, and the check the difference by difflib



回答5:

A copy of my own answer from here.


What about DaisyDiff (Java and PHP vesions available).

Following features are really nice:

  • Works with badly formed HTML that can be found "in the wild".
  • The diffing is more specialized in HTML than XML tree differs. Changing part of a text node will not cause the entire node to be changed.
  • In addition to the default visual diff, HTML source can be diffed coherently.
  • Provides easy to understand descriptions of the changes.
  • The default GUI allows easy browsing of the modifications through keyboard shortcuts and links.


回答6:

Since the .. library from google seams to have no active development any more, I suggest to use diff_py

From the github page:

The simple diff tool which is written by Python. The diff result can be printed in console or to html file.



回答7:

not just line level, but also word/character modifications within a line

xmldiff seems to be a nice package for this purpose especially when you have XML/HTML to compare. Read more in their documentation.