Ok, so after spending 2 days, I am not able solve it and I am almost out of time now. It might be a very silly question, so please bear with me. My awk script does something like this:
BEGIN{ n=50; i=n; }
FNR==NR {
# Read file-1, which has just 1 column
ids[$1]=int(i++/n);
next
}
{
# Read file-2 which has 4 columns
# Do something
next
}
END {...}
It works fine. But now I want to extend it to read 3 files. Let's say, instead of hard-coding the value of "n", I need to read a properties file and set value of "n" from that. I found this question and have tried something like this:
BEGIN{ n=0; i=0; }
FNR==NR {
# Block A
# Try to read file-0
next
}
{
# Block B
# Read file-1, which has just 1 column
next
}
{
# Block C
# Read file-2 which has 4 columns
# Do something
next
}
END {...}
But it is not working. Block A is executed for file-0, I am able to read the property from properties files. But Block B is executed for both files file-1 and file-2. And Block C is never executed.
Can someone please help me solve this? I have never used awk before and the syntax is very confusing. Also, if someone can explain how awk reads input from different files, that will be very helpful.
Please let me know if I need to add more details to the question.
Update: The solution below works, as long as all input files are nonempty, but see @Ed Morton's answer for a simpler and more robust way of adding file-specific handling.
However, this answer still provides a hopefully helpful explanation of some
awk
basics and why the OP's approach didn't work.Try the following (note that I've made the indices 1-based, as that's how
awk
does it):FNR==1
is true whenever a new input file is starting to get processed (FNR
contains the input file-relative line number).Every time a new file starts processing,
fIndex
is incremented and thus reflects the 1-based index of the current input file. Tip of the hat to @twalberg's helpful answer.awk
variable used in a numeric context defaults to0
, so there's no need to initializefIndex
(unless you want a different start value).fIndex == 1
can then be used to execute blocks for lines from a specific input file only (assuming the block ends innext
).As for why your approach didn't work:
Your 2nd and 3rd blocks are potentially executed unconditionally, for lines from all input files, because they are not preceded by a pattern (condition).
So your 2nd block is entered for lines from all subsequent input files, and its
next
statement then prevents the 3rd block from ever getting reached.Potential misconceptions:
Perhaps you think that each block functions as a loop processing a single input file. This is NOT how
awk
works. Instead, the entireawk
program is processed in a loop, with each iteration processing a single input line, starting with all lines from file 1, then from file 2, ...An
awk
program can have any number of blocks (typically preceded by patterns), and whether they're executed for the current input line is solely governed by whether the pattern evaluates to true; if there is no pattern, the block is executed unconditionally (across input files). However, as you've already discovered,next
inside a block can be used to skip subsequent blocks (pattern-block pairs).Perhaps you need to consider adding some additional structure like this:
If you have gawk, just test ARGIND:
If you don't have gawk, get it.
In other awks though you can just test for the file name:
That only fails if you want to parse the same file twice, if that's the case you need to add a count of the number of times that file's been opened.