Identifying common elements in multiple files

2019-05-16 10:32发布

问题:

I have 8 files of one column and non uniform number of rows in each column. I need to identify the elements which are common in all of these 8 files.

I can do this task for comparing two files, but I am unable to write workable one liner in shell to do the same.

Any ideas.....

Thank you in advance.

File 1
Paul
pawan

File 2
Raman
Paul
sweet
barua

File 3
Sweet
barua
Paul

The answer of the comparison of these three files should be Paul.

回答1:

python -c 'import sys;print "".join(sorted(set.intersection(*[set(open(a).readlines()) for a in sys.argv[1:]])))' File1 File2 File3

prints Paul for your files File1, File2 and File3.



回答2:

The following one-liner should do (change 3 to 8 to match your case)

$ sort * | uniq -c | grep 3
      3 Paul

Probably better to do this in python though, using sets...



回答3:

Perl

$ perl -lnE '$c{$_}{$ARGV}++ }{ print for grep { keys %{$c{$_}} == 8 } keys %c;' file[1-8]

It should be possible to get rid of the hard-coded 8 as well with @{[ glob "@ARGV" ]} but I don't have time to test it now.

This solution will correctly handle the existence of duplicate lines across files as well.



回答4:

Here I've been trying to find a concise way to make sure each match comes from a different file. If there are no duplicates within the files, it's fairly simple in perl:

perl -lnwE '$a{$_}++; END { for (keys %a) { print if $a{$_} == 3 } }' files*

The -l option will auto-chomp your input (remove newline), and add a newline to the print. This is important in case of missing eof newlines.

The -n option will read input from file name arguments (or stdin).

The hash assignment will count duplicates, and the END block will print out what duplicates appeared 3 times. Change 3 to however many files you have.

If you want a slightly more flexible version, you can count the arguments in a BEGIN block.

perl -lnwE 'BEGIN { $n = scalar @ARGV } 
    $a{$_}++; END { for (keys %a) { print if $a{$_} == $n } }' files*


回答5:

$ awk '++a[$0]==3' file{1..3}.txt
Paul

update

$ awk '(FILENAME SEP $0) in b{next}; b[FILENAME,$0]=1 && ++a[$0]==3' file{1..3}.txt
Paul


回答6:

This might work for you:

ls file{1..3} | 
xargs -n1 sort -u | 
sort | 
uniq -c | 
sed 's/^\s*'"$(ls file{1..3} | wc -l)"'\s*//p;d'