I have two tab separated files (please see the examples below):
File 1
Java RAJ
PERL ALEX
PYTHON MAurice
(and so on)
File 2
ALEX 3.4
SAM 8.9
PEPPER 9.0
Now, if for instance say ALEX is also found in file 2 (it is not for sure that ALEX will be found) I should have a third file looking like this:
PERL ALEX 3.4
The code should check for all the values in column 2 of file 1 in file2.
Any suggestions for a bash script?
You want to use join
for that. First you need to sort according to join field though:
join -1 2 -2 1 <(sort +1 -2 file1) <(sort +0 -1 file2)
awk 'NR==FNR {val[$1]=$2; next} $2 in val {print $0, val[$2]}' file2 file1
Is a one-liner with PERL also ok?
Works without sorting..
Assuming your files are called f1 and f2..
perl -e 'open(F1, shift); open(F2, shift); $ls = $/;undef $/;$f2 = <F2>;$/ = $ls; while(<F1>) { ($t1, $t2) = $_ =~ /^(\w+)\s+(\w+)$/; if($t1) { ($t3) = $f2 =~ /^$t2\s+(.+)$/m; print "$t1 $t2 $t3 \n" if ($t3); } }' f1 f2
With f1:
Java RAJ
PERL ALEX
PYTHON Maurice
And f2:
ALEX 3.4
SAM 8.9
PEPPER 9.0
Results in:
PERL ALEX 3.4
You received excellent answers using join and awk, so I thought I's post a pure bash-one:
#!/bin/bash
declare -A name2prog
declare -A name2num
while read a b; do name2prog[$b]=$a; done < file1
while read a b; do name2num[$a]=$b; done < file2
for i in "${!name2num[@]}"
do
if [[ ${name2prog[$i]} ]]; then
echo ${name2prog[$i]} $i ${name2num[$i]}
fi
done
outputs:
$ ./try.sh
PERL ALEX 3.4