How to compare one file with bunch of files in lin

2019-09-04 06:22发布

I have a fileA as shown below:

file A

chr1   123 aa b c d
chr1   234 a  b c d
chr1   345 aa b c d
chr1   456 a  b c d
....

And I have a bunch of similar files with similar columns in a dirB with which i have to compare file A.

To do this I concatenated all the files in dirB using cat into a single file called fileB and then compared both the files based on key columns 1 and 2 as shown below:

awk 'FNR==NR{a[$1,$2]++;next}!a[$1,$2]' fileB fileA

This command uses the columns 1 and 2 as keys and gives the rows which have key only in fileA.

However, the issue here is, fileB is to huge to handle in terms of space and memory to run when there are large number of files.

Could someone suggest an alternative, so that it skips the step of concatenating all files to create fileB. Instead, fileA could be directly compared with all the files in dirB

chr1   123    aa    b    c    d    xxxx    abcd
chr1   234    a     b    c    d
chr1   345    aa    b    c    d    yyyy    defg
chr1   456    a    b    c    d

标签： linux awk

1条回答

Explosion°爆炸

2楼-- · 2019-09-04 07:07

Perhaps something along these lines:

 awk 'NR == FNR { a[$1,$2] = $0; next } 
                { delete a[$1, $2] }
            END { for (i in a) print a[i] }
 ' a.txt b1.txt b2.txt ...

Starting with file A, add each key to an array with the contents of its row for the value. Then for all the B files, delete any elements from the array with matching keys. At the end any elements remaining are those in A that weren't in any of the B files so we can just loop through and print them out.

0人赞添加讨论(0) 举报

How to compare one file with bunch of files in lin

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间