I have this strange phenomenon I don't understand. I guess I'm missing something important in awk. I want to cumulatively concatenate the whole line to a string called 'c' when there is a line containing SIGNAL. The string starts with a concatenation of 'a' and 'b' which works fine.
file in.dat
:
SIGNAL Hello1!
file tt.awk
:
BEGIN { a = "a"; b = "b"; c = a b; }
/SIGNAL/ { c = c " " $0; }
END { print c; }
when I do awk -f tt.awk in.dat
I get (as expected):
ab SIGNAL Hello1!
now I change in.dat
to:
SIGNAL Hello1!
SIGNAL Hello2!
Then I do awk -f tt.awk in.dat
again and get:
SIGNAL Hello2!1!
I expected to see:
ab SIGNAL Hello1! SIGNAL Hello2!
I am doing it on my CentOS shell (with bunch of settings in my ~/.cshrc
file). I checked these on my Cygwin shell and it works normally as I expect. Something is wrong with my CentOS shell setting. What could it be?
This is a problem with DOS line endings (as noted by Etan Reisner in the comments above). Your second version of
in.dat
uses\r\n
for line breaks andawk
can't deal with that.Using your same
tt.awk
code:Wondering what this is really doing? In UNIX,
\r
resets the position in line to the leftmost place but does not send you down a line (that's what\n
does). DOS interprets\n
as going down a line but not resetting to the leftmost position while UNIX takes the\r
as implicit.Here are some experiments to illustrate what's going on:
Pay special attention to the last two items.
awk
strips the\n
for you, but retains the\r
, so the first line prints asab SIGNAL Hello1!
and then the\r
is applied, and the second linesignal Hello2!
is written on top of that first line. The first line's final two characters (1!
) remain because the second line wasn't long enough to overwrite them.Now that we know the issue, we can fix the code:
This removes all
\r
s from lines that are added toc
.