Bash: find patterns of a file in another file and

2019-02-28 10:20发布

问题:

Carissimi,

I've been trying for a while to solve this problem and I checked many posts (for example here grep, awk or sed? Print lines in one file matching patterns in another file or here awk search for a field in another file) without really finding what I am looking for. I need the solution with bash tools like sed, grep, awk (no python, R,...)

I have two files (much bigger than those):

file1:

   2   891299  0.50923964E-02     1248   4.713       1349.08
   3   245857  0.57915542E-02     1335   4.671       1369.65

file2:

   278    2645  2334659  0.75142      0.53123
   279    2643   245857  0.80439      0.56868
   500    1341   830677  0.74922      0.52958
   501    1339   882791  0.87685      0.61980
   502    1337   891299  0.63224      0.44680

In this example I want to find the pattern in column 2 of file1 in column 3 of file2 and print column 1 of the latter, for all the lines of file1 and maintaining the order given by file1.

A possible solution (I am aware is not bug free) is the following unacceptably slow bash loop:

for i in `awk '{print $2}' file1` ; do grep " $i " file2 | awk '{print $1}' ; done

which prints to screen:

502

279

Please note that a 'simple' solution like:

awk 'NR==FNR{pats[$2]; next} $3 in pats' file1 file2

is not appropriate as the order of the printing is given by file2 and not by file1 (i.e. it prints to screen first 279 and then 502).

Thanks a lot for your help.

Marco

回答1:

You can reverse files to be processed in awk and get the right output:

awk 'NR==FNR{pats[$3]=$1; next} $2 in pats{print pats[$2]}' file2 file1
502
279