Python Difflib - How to Get SDiff Sequences with “

2019-07-17 16:03发布

I am reading the documentation for Python's difllib. According to the docs each, Differ delta gives a sequence

Code    Meaning
'- '    line unique to sequence 1
'+ '    line unique to sequence 2
'  '    line common to both sequences
'? '    line not present in either input sequence 

But what about the "Change" operation? How do I get a "c " instruction similar to the results in Perl's sdiff?

3条回答
在下西门庆
2楼-- · 2019-07-17 16:29

Show this script.

sdiff.py @ hungrysnake.net

http://hungrysnake.net/doc/software__sdiff_py.html

Perl's sdiff(Algorithm::Diff) dont think about "Matching rate", but python's sdiff.py think about it. =)

I have 2 text files.

$ cat text1.txt
aaaaaa
bbbbbb
cccccc
dddddd
eeeeee
ffffff

$ cat text2.txt
aaaaaa
bbbbbb
xxxxxxx
ccccccy
zzzzzzz
eeeeee
ffffff

I got side by side diff by sdiff command or Perl's sdiff(Algorithm::Diff).

$ sdiff text1.txt text2.txt
aaaaaa          aaaaaa
bbbbbb          bbbbbb
cccccc      |   xxxxxxx
dddddd      |   ccccccy
            >   zzzzzzz
eeeeee          eeeeee
ffffff          ffffff

Sdiff dont think about "Matching rate" =(

I got it by sdiff.py

$ sdiff.py text1.txt text2.txt
--- text1.txt (utf-8)
+++ text2.txt (utf-8)
 1|aaaaaa             1|aaaaaa
 2|bbbbbb             2|bbbbbb
  |            >      3|xxxxxxx
 3|cccccc      |      4|ccccccy
 4|dddddd      <       |
  |            >      5|zzzzzzz
 5|eeeeee             6|eeeeee
 6|ffffff             7|ffffff

[     ]      |      + 
[ <-  ]     3|cccccc  
[  -> ]     4|ccccccy 

Sdiff.py think about "Matching rate" =)

I want result by sdiff.py. dont you ?

查看更多
Anthone
3楼-- · 2019-07-17 16:30

There is no direct c like code in difflib to show changed lines as in Perl's sdiff you talked about. But you can make one easily. In difflib's delta, the "changed lines" also have '- ', but in contrast to the actually deleted lines, the next line in the delta is tagged with '? ' to mean that the line in the previous index of the delta is "changed", not deleted. Another purpose of this line in delta is that it acts as 'guide' as to where the changes are in the line.

So, if a line in the delta is tagged with '- ', then there are four different cases depending on the next few lines of the delta:

Case 1: The line modified by inserting some characters

- The good bad
+ The good the bad
?          ++++

Case 2: The line is modified by deleting some characters

- The good the bad
?          ----
+ The good bad

Case 3: The line is modified by deleting and inserting and/or replacing some characters:

- The good the bad and ugly
?      ^^ ----
+ The g00d bad and the ugly
?      ^^          ++++

Case 4: The line is deleted

- The good the bad and the ugly
+ Our ratio is less than 0.75!

As you can see, the lines tagged with '? ' show exactly where what type of modification is made.

Note that difflib considers a line is deleted if the value of ratio() between the two lines being compared is less than 0.75. It is a value I found out by some tests.

So to infer a line as changed, you can do this. This will return the diffs with changed lines tagged with code 'c ', and unchanged lines tagged as 'u ', just like in Perl's sdiff:

def sdiffer(s1, s2):
    differ = difflib.Differ()
    diffs = list(differ.compare(s1, s2))

    i = 0
    sdiffs = []
    length = len(diffs)
    while i < length:
        line = diffs[i][2:]
        if diffs[i].startswith('  '):
            sdiffs.append(('u', line))

        elif diffs[i].startswith('+ '):
            sdiffs.append(('+', line))

        elif diffs[i].startswith('- '):
            if i+1 < length and diffs[i+1].startswith('? '): # then diffs[i+2] starts with ('+ '), obviously
                sdiffs.append(('c', line))
                i += 3 if i + 3 < length and diffs[i + 3].startswith('? ') else 2

            elif diffs[i+1].startswith('+ ') and i+2<length and diffs[i+2].startswith('? '):
                sdiffs.append(('c', line))
                i += 2
            else:
                sdiffs.append(('-', line))
        i += 1
    return sdiffs

Hope it helps.

P.S.: It is an old question, so I am not sure how well will my efforts be awarded. :-( I just could not help answering this question, as I have been working a little with difflib lately.

查看更多
Emotional °昔
4楼-- · 2019-07-17 16:39

I don't know pretty much what the Perl's "Change" operation is. If it similar to PHP DIFF output, I solve my problem with this code :

def sdiffer(s1, s2):
    differ = difflib.Differ()
    diffs = list(differ.compare(s1, s2))

    i = 0
    sdiffs = []
    length = len(diffs)
    sequence = 0
    while i < length:
        line = diffs[i][2:]
        if diffs[i].startswith('  '):
            sequence +=1
            sdiffs.append((sequence,'u', line))

        elif diffs[i].startswith('+ '):
            sequence +=1
            sdiffs.append((sequence,'+', line))

        elif diffs[i].startswith('- '):
            sequence +=1
            sdiffs.append((sequence,'-',diffs[i][2:]))
            if i+1 < length and diffs[i+1].startswith('? '):
                if diffs[i+3].startswith('?') and i+3 < length : # case 2
                    sequence +=1
                    sdiffs.append((sequence,'+',diffs[i+2][2:]))
                    i+=3
                elif diffs[i+2].startswith('?') and i+2 < length: # case 3
                    sequence +=1
                    sdiffs.append((sequence,'+',diffs[i+2][2:]))
                    i+=2
            elif diffs[i+1].startswith('+ ') and i+2<length and diffs[i+2].startswith('? '): # case 1
                sequence +=1
                sdiffs.append((sequence,'+', diffs[i+1][2:]))
                i += 2
            else: # the line is deleted and inserted new line # case 4
                sequence +=1
                sdiffs.append((sequence,'+', diffs[i+1][2:]))
                i+=1   
        i += 1
    return sdiffs

Thanks @Sнаđошƒаӽ for your code.

查看更多
登录 后发表回答