Why diff with ignore matching lines doesn't wo

2019-01-25 09:11发布

站内文章 / 移动开发

21 0

我命由我不由天

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have the following files:

file1.txt

###################################################
Dump stat Title information for 'ssummary' view
###################################################
Tab=> 'Instance' Title=> {text {Total instances: 7831}}
Tab=> 'Device' Title=> {text {Total spice devices: 256}}
Tab=> 'Memory' Title=> {text {Total memory allocated: 962192 kB}}
Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}

file2.txt

###################################################
Dump stat Title information for 'ssummary' view
###################################################
Tab=> 'Instance' Title=> {text {Total instances: 7831}}
Tab=> 'Device' Title=> {text {Total spice devices: 256}}
Tab=> 'Memory' Title=> {text {Total memory allocated: 9621932 kB}}
Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

And I'm running the following command:

diff -I 'Memory' file1.txt file2.txt

which outputs:

6,7c6,7
< Tab=> 'Memory' Title=> {text {Total memory allocated: 962192 kB}}
< Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}
---
> Tab=> 'Memory' Title=> {text {Total memory allocated: 9621932 kB}}
> Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

However my expected output is:

< Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}
---
> Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

Note that in the command if I change 'Memory' to 'Tab' or 'Title' problem's solved, but probably all lines are ignored cause they all have Tab and Title.

回答1:

This behaviour looks a bit weird indeed. I noticed something by tweaking your input files (I just moved the "Memory" line to the top on both files) :

file1.txt

###################################################
Dump stat Title information for 'ssummary' view
###################################################
Tab=> 'Memory' Title=> {text {Total memory allocated: 962192 kB}}
Tab=> 'Instance' Title=> {text {Total instances: 7831}}
Tab=> 'Device' Title=> {text {Total spice devices: 256}}
Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}

file2.txt

###################################################
Dump stat Title information for 'ssummary' view
###################################################
Tab=> 'Memory' Title=> {text {Total memory allocated: 9621932 kB}}
Tab=> 'Instance' Title=> {text {Total instances: 7831}}
Tab=> 'Device' Title=> {text {Total spice devices: 256}}
Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

A plain diff will give you :

diff file1.txt file2.txt

4c4
< Tab=> 'Memory' Title=> {text {Total memory allocated: 962192 kB}}
---
> Tab=> 'Memory' Title=> {text {Total memory allocated: 9621932 kB}}
7c7
< Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}
---
> Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

Notice that there are two sets of differences now... with that arrangement, the diff -I 'Memory' file1.txt file2.txt command will work and output this :

7c7
< Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 9030 ms}}
---
> Tab=> 'Cpu' Title=> {text {Total cumulative CPU time: 90303 ms}}

Meaning, the -I flag seems to work only when every line in a set of differences matches the expression. I don't know if this is a bug or expected behaviour... but it's certainly inconsistent.

EDIT : actually, as per the GNU diff documentation, it IS the expected behavior. The man page is not so clear. OpenBSD diff has a -I flag too, but their man page explains it better.

回答2:

This behaviour is normal given the way diff works (as of April 2013).

diff is line oriented, it means that a line is either considered totally different or totally equivalent. When a line is ignored, it is entered into the list of different lines before comparison, and when the change script is computed, changes made only of ignored lines are considered themselves as ignored. When ignored lines are adjacent to changed lines, it makes up a single non-ignored change.

The problem lies in the inability of diff to understand that consecutive lines are not related: you are not diffing a sequence of text (what diff is aimed at), but rather a list of independent lines which are keyed (Tab >= <key>). These problems seem pretty similar when both files are generated in the same order, but still not the same.

回答3:

Well you learn something new every day. I was equally confused and frustrated by this behaviour, which seems to be roughly [diff the input files, then filter out the RE] rather than [filter the RE out of the input files, then diff].

I would have thought the second approach more natural and more useful. For instance this seems to be the way --ignore-case and --strip-trailing-cr work, adjusting the input files before diffing. Additionally, actually achieving what the questioner wanted involves filtering both inputs to temp files, diffing them, then removing them. It becomes even more tedious if you want to do a recursive diff as I did.

I acknowledge that diff behaves the way it's documented rather than how I want it to behave, but respectfully suggest that this option (and similar for -b, -w too) could usefully be added to diff.

回答4:

This is expected behaviour as per diffutils manual:

However, -I only ignores the insertion or deletion of lines that contain the regular expression if every changed line in the hunk (every insertion and every deletion) matches the regular expression.

In other words, for each non-ignorable change, diff prints the complete set of changes in its vicinity, including the ignorable ones. You can specify more than one regular expression for lines to ignore by using more than one -I option. diff tries to match each line against each regular expression, starting with the last one given. ^{(man diff)}

You may try to set a smaller set of changes by specifying -d, but in your example it won't work.

-d --minimal Try hard to find a smaller set of changes.

回答5:

From man diff, if I recall well, the -I just ignores the reg exp contained in it. Which means that if f1 is:

the pen is on the table

and f2 is:

the pun is on the table

would correctly parse:

diff -I 'p.n' f2 f2

giving nothing

BUT

if f2 now becomes

the pun is on the cable

the regexp is not matched anymore (cable and table are not matched by the regexp...) and so u would have the two lines coming up in the output...

So, just try to change the command in:

diff -I '.*Memory.*' file1.txt file2.txt

that should do the trick (sorry for the stupid examples..)

标签： shell diff ignore

我命由我不由天

女 | 书童

私信

收藏的人(0)

Ta的文章更多文章

0条评论

还没有人评论过~

Why diff with ignore matching lines doesn't wo

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮