How to count differences between two files on linu

I need to work with large files and must find differences between two. And I don't need the different bits, but the number of differences.

To find the number of different rows I come up with

diff --suppress-common-lines --speed-large-files -y File1 File2 | wc -l

And it works, but is there a better way to do it?

And how to count the exact number of differences (with standard tools like bash, diff, awk, sed some old version of perl)?

标签： shell count diff

6条回答

手持菜刀，她持情操

2楼-- · 2020-02-16 12:17

If using Linux/Unix, what about comm -1 file1 file2 to print lines in file1 that aren't in file2, comm -1 file1 file2 | wc -l to count them, and similarly for comm -2 ...?

0人赞添加讨论(0) 举报

霸刀☆藐视天下

3楼-- · 2020-02-16 12:21

Since every output line that differs starts with < or > character, I would suggest this:

diff file1 file2 | grep ^[\>\<] | wc -l

By using only \< or \> in the script line you can count differences only in one of the files.

0人赞添加讨论(0) 举报

你好瞎i

4楼-- · 2020-02-16 12:33

If you want to count the number of lines that are different use this:

diff -U 0 file1 file2 | grep ^@ | wc -l

Doesn't John's answer double count the different lines?

0人赞添加讨论(0) 举报

仙女界的扛把子

5楼-- · 2020-02-16 12:35

If you're dealing with files with analogous content that should be sorted the same line-for-line (like CSV files describing similar things) and you would e.g. want to find 2 differences in the following files:

File a:    File b:
min,max    min,max
1,5        2,5
3,4        3,4
-2,10      -1,1

you could implement it in Python like this:

different_lines = 0
with open(file1) as a, open(file2) as b:
    for line in a:
        other_line = b.readline()
        if line != other_line:
            different_lines += 1

0人赞添加讨论(0) 举报

The star\"

6楼-- · 2020-02-16 12:40

I believe the correct solution is in this answer, that is:

$ diff -y --suppress-common-lines a b | grep '^' | wc -l
1

0人赞添加讨论(0) 举报

Rolldiameter

7楼-- · 2020-02-16 12:42

diff -U 0 file1 file2 | grep -v ^@ | wc -l

That minus 2 for the two file names at the top of the diff listing. Unified format is probably a bit faster than side-by-side format.

0人赞添加讨论(0) 举报

How to count differences between two files on linu

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间