For example, suppose I run the following command:
gawk -f AppendMapping.awk Reference.tsv TrueInput.tsv
Assume the names of files WILL change.
While iterating through the first file, I want to create a mapping.
map[$16]=$18
While iterating through the second file, I want to use the mapping.
print $1, map[$2]
What's the best way to achieve this behavior (ie, different behavior for each input file)?
As you probably know NR
stores the current line number; as you may or may not know, it's cumulative - it doesn't get reset between files. FNR
, on the other hand, is specific to the file, so you can use those two to see whether you're in the first file (beyond the second you'll need to keep your own counter).
# In case you want to keep track of the file number
FNR == 1 { fileno++ }*emphasized text*
NR == FNR {
# First file
}
NR != FNR {
# Second or later file
}
You could also use getline
in the BEGIN
block to loop through it manually.
BEGIN {
file = ARGV[1]
while(getline < file) {
# Process line
}
delete ARGV[1]
}
Gawk versions 4 and high offer the special BEGINFILE
(and ENDFILE
) block as well as the usual BEGIN
and END
blocks. Use them to set flags on which you vary the behavior of your code.
Recall that patterns can include comparisons with variables, so you can select patters directly on the value of your flags.
The man page says:
For each input file, if a BEGINFILE rule exists, gawk executes the associated code before processing
the contents of the file. Similarly, gawk executes the code associated with ENDFILE after processing
the file.
This might work for you:
seq 5 >/tmp/a
seq 100 105 >/tmp/b
awk 'FILENAME==ARGV[1]{print FILENAME,$0};FILENAME==ARGV[2]{print $0,FILENAME}' /tmp/{a,b}
/tmp/a 1
/tmp/a 2
/tmp/a 3
/tmp/a 4
/tmp/a 5
100 /tmp/b
101 /tmp/b
102 /tmp/b
103 /tmp/b
104 /tmp/b
105 /tmp/b
So by combining FILENAME
with ARGV[n]
where n
is the nth file on the command line, awk
can conditionally change individual files.
N.B. ARGV[0] would be the awk
command.