On Linux, this runs as expected:
$ echo -e "line1\r\nline2"|awk -v RS="\r\n" '/^line/ {print "awk: "$0}'
awk: line1
awk: line2
But under windows the \r is dropped (awk considers this one line):
Windows:
$ echo -e "line1\r\nline2"|awk -v RS="\r\n" '/^line/ {print "awk: "$0}'
awk: line1
line2
Windows GNU Awk 4.0.1 Linux GNU Awk 3.1.8
EDIT from @EdMorton (sorry if this is an unwanted addition but I think maybe it helps demonstrate the issue):
Consider this RS setting and input (on cygwin):
$ awk 'BEGIN{printf "\"%s\"\n", RS}' | cat -v
"
"
$ echo -e "line1\r\nline2" | cat -v
line1^M
line2
This is Solaris with gawk:
$ echo -e "line1\r\nline2" | awk '1' | cat -v
line1^M
line2
and this is cygwin with gawk:
$ echo -e "line1\r\nline2" | awk '1' | cat -v
line1
line2
RS
was just it's default newline so where did the control-M go in cygwin?
I just checked with Arnold Robbins (the provider of gawk) and the answer is that it's something done by the C libraries and to stop it happening you should set the awk BINMODE variable to 3:
See the man page for more info if interested.
It seems like the issue is
awk
specific under Cygwin.I tried a few different things and it seems that
awk
is silently treating replacing\r\n
with\n
in the input data.If we simply ask
awk
to repeat the text unmodified, it will "sanitize" the carriage returns without asking:It will, however, leave other carriage returns intact:
Using a custom record separator of
_
ended up leaving the carriage returns intact:The most telling example involves having
\r\n
in the data, but not as a record separator:awk
is blindly converting\r\n
to\n
in the input data even though we didn't ask it to.This substitution seems to be happening before applying record separation, which explains why
RS="\r\n"
never matches anything. By the timeawk
is looking for\r\n
, it's already substituted it with\n
in the input data.