I am trying to add last line to the file which I am creating. How is it possible to detect the last line of a file in awk before END
? I need to do this because the variables don't work in the END
block,
so I am trying to avoid using END
.
awk ' { do some things..; add a new last line into file;}'
before END
, I don't want this:
awk 'END{print "something new" >> "newfile.txt"}'
One option is to use getline
function to process the file. It returns 1
on sucess, 0
on end of file and -1
on an error.
awk '
FNR == 1 {
## Process first line.
print FNR ": " $0;
while ( getline == 1 ) {
## Process from second to last line.
print FNR ": " $0;
}
## Here all lines have been processed.
print "After last line";
}
' infile
Assuming infile
with this data:
one
two
three
four
five
Output will be:
1: one
2: two
3: three
4: four
5: five
After last line
$ cat file
1
2
3
4
5
By reading same file twice ( Recommended )
$ awk 'FNR==NR{last++;next}{print $0, ((last==FNR)?"I am Last":"")}' file file
1
2
3
4
5 I am Last
Using getline
$ awk 'BEGIN{while((getline t < ARGV[1]) > 0)last++;close(ARGV[1])}{print $0, ((last==FNR)?"I am Last":"")}' file
1
2
3
4
5 I am Last
Print the previous line.
When current line is 2, print line 1,
when current line is 3, print line 2.
....
till the end
awk '{
if (NR>1) {
# process str
print str;
}
str=$0;
}
END {
# process whatever needed before printing the last line and then print the last line.
print str;
}'
You can get the number of lines in a file using "wc -l" | getline filesize
in the begin block and use NR == filesize
to test the last line in the script body.
You can use ENDFILE
, it executes before END
:
$ awk 'END {print "end"} ENDFILE{print "last line"}' /dev/null /dev/null
last line
last line
end
ENDFILE exists in latest version of awk (>4.0 I think).
I know the answer was accepted, but it is simply wrong.
Because you do want to use awk as a parser and not as a code.
Awk should be used within some unix pipes and it should not be used within any logic.
I had the same problem and I solved it within awk like this:
nlines=wc -l <file>
cat | awk -v nl=${nlines} '{if (nl != NR) {print $0,",","\";} else {print;}}' >> ${someout}
There is an important point here: pipes, flush, and RAM.
If you make awk to spit out its output you can pipe it to the next processor.
If you use getline, and in particular within a loop, you might not see the end.
getline should be used only for a line and an eventual dependency on the next line.
I love awk, but we cannot do everything with it!
EDITED:
For whom down-voted the answer, I just want to present this script:
#! /bin/sh
#
# Generate random strings
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 100000 > x.r.100000
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 1000000 > x.r.1000000
cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 5000000 > x.r.5000000
#
# To save you time in case
#cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 32 | head -n 10000000 > x.r.10000000
#
# Generate awk files
cat <<"EOF" > awkGetline.sh
#! /bin/sh
#
awk '
FNR == 1 {
## Process first line.
print FNR ": " $0;
while ( getline == 1 ) {
## Process from second to last line.
print FNR ": " $0;
}
}
' x.r
#
EOF
#
chmod +x awkGetline.sh
#
cat <<"EOF" > awkPlain.sh
#! /bin/sh
#
awk '
{print FNR ": " $0;}
' x.r
#
EOF
#
# x.r.100000
#
chmod +x awkPlain.sh
#
# Execute awkGetline.sh 10 times on x.r.100000
rm -f x.t
cp x.r.100000 x.r
for runInstance in 1 2 3 4 5 6 7 8 9 10;
do
/usr/bin/time -p -a -o x.t ./awkGetline.sh > x.1.out;
done;
#
cat x.t | grep real | awk 'BEGIN {sum=0.0} {sum=sum+$2; print $2, sum/10;} END {print "SUM Getln", sum;}' | grep SUM
#
#
# Execute awkPlain.sh 10 times on x.r.100000
rm -f x.t
cp x.r.100000 x.r
for runInstance in 1 2 3 4 5 6 7 8 9 10;
do
/usr/bin/time -p -a -o x.t ./awkPlain.sh > x.1.out;
done;
#
cat x.t | grep real | awk 'BEGIN {sum=0.0} {sum=sum+$2; print $2, sum/10;} END {print "SUM Plain", sum;}' | grep SUM
#
#
# x.r.1000000
#
chmod +x awkPlain.sh
#
# Execute awkGetline.sh 10 times on x.r.1000000
rm -f x.t
cp x.r.1000000 x.r
for runInstance in 1 2 3 4 5 6 7 8 9 10;
do
/usr/bin/time -p -a -o x.t ./awkGetline.sh > x.1.out;
done;
#
cat x.t | grep real | awk 'BEGIN {sum=0.0} {sum=sum+$2; print $2, sum/10;} END {print "SUM Getln", sum;}' | grep SUM
#
#
# Execute awkPlain.sh 10 times on x.r.1000000
rm -f x.t
cp x.r.1000000 x.r
for runInstance in 1 2 3 4 5 6 7 8 9 10;
do
/usr/bin/time -p -a -o x.t ./awkPlain.sh > x.1.out;
done;
#
cat x.t | grep real | awk 'BEGIN {sum=0.0} {sum=sum+$2; print $2, sum/10;} END {print "SUM Plain", sum;}' | grep SUM
#
#
# x.r.5000000
#
chmod +x awkPlain.sh
#
# Execute awkGetline.sh 10 times on x.r.5000000
rm -f x.t
cp x.r.5000000 x.r
for runInstance in 1 2 3 4 5 6 7 8 9 10;
do
/usr/bin/time -p -a -o x.t ./awkGetline.sh > x.1.out;
done;
#
cat x.t | grep real | awk 'BEGIN {sum=0.0} {sum=sum+$2; print $2, sum/10;} END {print "SUM Getln", sum;}' | grep SUM
#
#
# Execute awkPlain.sh 10 times on x.r.5000000
rm -f x.t
cp x.r.5000000 x.r
for runInstance in 1 2 3 4 5 6 7 8 9 10;
do
/usr/bin/time -p -a -o x.t ./awkPlain.sh > x.1.out;
done;
#
cat x.t | grep real | awk 'BEGIN {sum=0.0} {sum=sum+$2; print $2, sum/10;} END {print "SUM Plain", sum;}' | grep SUM
#
exit;
# To save you time in case
#
# x.r.10000000
#
chmod +x awkPlain.sh
#
# Execute awkGetline.sh 10 times on x.r.10000000
rm -f x.t
cp x.r.10000000 x.r
for runInstance in 1 2 3 4 5 6 7 8 9 10;
do
/usr/bin/time -p -a -o x.t ./awkGetline.sh > x.1.out;
done;
#
cat x.t | grep real | awk 'BEGIN {sum=0.0} {sum=sum+$2; print $2, sum/10;} END {print "SUM Getln", sum;}' | grep SUM
#
#
# Execute awkPlain.sh 10 times on x.r.10000000
rm -f x.t
cp x.r.10000000 x.r
for runInstance in 1 2 3 4 5 6 7 8 9 10;
do
/usr/bin/time -p -a -o x.t ./awkPlain.sh > x.1.out;
done;
#
cat x.t | grep real | awk 'BEGIN {sum=0.0} {sum=sum+$2; print $2, sum/10;} END {print "SUM Plain", sum;}' | grep SUM
#
And of course the first results:
tmp]$ ./awkRun.sh
SUM Getln 0.78
SUM Plain 0.71
SUM Getln 7.2
SUM Plain 6.49
SUM Getln 35.91
SUM Plain 32.92
Where you save about 10% of the time just because of the getline.
Consider this within more complex logic and you might get even a worst picture. In this plain version, memory consideration are not accounted.
And seems they do not play a role for this simple version. But memory might also play a role if you get into more complex logic ...
Of course try it on your machine.
This is why I was suggesting to consider other options, in general.