Carissimi,
I've been trying for a while to solve this problem and I checked many posts (for example here grep, awk or sed? Print lines in one file matching patterns in another file or here awk search for a field in another file) without really finding what I am looking for. I need the solution with bash tools like sed, grep, awk (no python, R,...)
I have two files (much bigger than those):
file1:
2 891299 0.50923964E-02 1248 4.713 1349.08
3 245857 0.57915542E-02 1335 4.671 1369.65
file2:
278 2645 2334659 0.75142 0.53123
279 2643 245857 0.80439 0.56868
500 1341 830677 0.74922 0.52958
501 1339 882791 0.87685 0.61980
502 1337 891299 0.63224 0.44680
In this example I want to find the pattern in column 2 of file1 in column 3 of file2 and print column 1 of the latter, for all the lines of file1 and maintaining the order given by file1.
A possible solution (I am aware is not bug free) is the following unacceptably slow bash loop:
for i in `awk '{print $2}' file1` ; do grep " $i " file2 | awk '{print $1}' ; done
which prints to screen:
502
279
Please note that a 'simple' solution like:
awk 'NR==FNR{pats[$2]; next} $3 in pats' file1 file2
is not appropriate as the order of the printing is given by file2 and not by file1 (i.e. it prints to screen first 279 and then 502).
Thanks a lot for your help.
Marco