The first week of October, Arnold Robbins announced Beta release of gawk 4.2.0 now available in the GNU-announce, bug-gawk and comp.lang.awk mailing lists. It is available in http://www.skeeve.com/gawk/gawk-4.1.65.tar.gz 1 and he mentions that This is a major release, with many significant new features.
So I went through the NEWS file to dig into these features and stopped in this point to do some tests:
Changes from 4.1.4 to 4.2.0
...
- Revisions in the POSIX standard remove the special case for POSIX mode when FS = " " where newline was not a field separator. The code and doc have been updated.
If I understand properly, he talks about GNU Awk User's Guide → 4.5.2 Using Regular Expressions to Separate Fields:
There is an important difference between the two cases of ‘FS = " "’ (a single space) and ‘FS = "[ \t\n]+"’ (a regular expression matching one or more spaces, TABs, or newlines). For both values of FS, fields are separated by runs (multiple adjacent occurrences) of spaces, TABs, and/or newlines. However, when the value of FS is " ", awk first strips leading and trailing whitespace from the record and then decides where the fields are.
That is, the difference between using FS = " "
and FS = "[ \t\n]+"
.
I ran the new version and ran a test with the --posix
mode:
$ ./gawk --posix -F" " '{print "NR:", NR; for(i=1;i<=NF;i++) print i, $i}' <<< "hello how are
you"
NR: 1
1 hello
2 how
3 are
NR: 2
1 you
And compared with my previous awk (4.1.3) and could not see any difference:
$ gawk --posix -F" " '{print "NR:", NR; for(i=1;i<=NF;i++) print i, $i}' <<< "hello how are
you"
NR: 1
1 hello
2 how
3 are
NR: 2
1 you
All in all, my question is: what is the difference in the behaviour of FS = " "
in the --posix
mode for GNU Awk 4.2? What has been changed exactly?
1 yes, I also thought it should be 4.2.tar.gz
, but http://www.skeeve.com/gawk/gawk-4.2.tar.gz does not exist