I have two tab-delimited files.
File 1 (10 rows and say 4 columns, these columns might not be filled in each row):
Chra stra stpa NM1 NM2 NR1
Chrb strb stpb NR2 NM1
File 2 (25 rows and 3 columns):
Tg NM1 12
Tg NM3 3
Tg NR1 76
Now what I want to do, is to compare the NM and NR identifiers in present each row of file 1 to file 2i f anywhere in file2 NR identifier matches. It should extract the corresponding value of NR/NM identifier from file 2.
File 3 may look like this (say for NM1):
chra stra stpa NM1 12
chra stra stpa NR1 76
Any suggestions for a shell script?
$ join -1 4 -2 2 \
<(for i in 4 5 6 7; do join -e _ -j $i f1 f1 -o 1.1,1.2,1.3,0; done |
sed '/_$/d' | sort -k4,4) \
<(sort -k2,2 f2) \
-o 1.1,1.2,1.3,0,2.3
Chra stra stpa NM1 12
Chrb strb stpb NM1 12
Chra stra stpa NR1 76
Rather than shell script, I'd do this kind of thing with a Perl script. You can use the split() function to get an array with all the "fields" for each line, and it's downhill from there. No need to think up a fancy regular expression. Here's
an example of doing this kind of thing:
awk '
NR == FNR {tag[$2] = $3; next}
{
# determine if this line has a "NR" tag from file2
have_nr = 0
for (i=4; i<=NF; i++) {
if ($i ~ /^NR/ && $i in tag) {
have_nr = 1
break
}
}
# if it does have a matching NR tag, then
# print the tag value for every matching NR/NM tag
if (have_nr) {
for (i=4; i<=NF; i++) {
if ($i in tag) {
print $1, $2, $3, $i, tag[$i]
}
}
}
}
' file2 file1