awk string concatenation not working 2 (shell sett

2019-07-25 20:56发布

I have this strange phenomenon I don't understand. I guess I'm missing something important in awk. I want to cumulatively concatenate the whole line to a string called 'c' when there is a line containing SIGNAL. The string starts with a concatenation of 'a' and 'b' which works fine.

file in.dat:

SIGNAL Hello1!

file tt.awk:

BEGIN    { a = "a"; b = "b"; c = a b; }
/SIGNAL/ { c = c " " $0; }
END      { print c; }

when I do awk -f tt.awk in.dat I get (as expected):

ab SIGNAL Hello1!

now I change in.dat to:

SIGNAL Hello1!
SIGNAL Hello2!

Then I do awk -f tt.awk in.dat again and get:

 SIGNAL Hello2!1!

I expected to see:

ab SIGNAL Hello1! SIGNAL Hello2!

I am doing it on my CentOS shell (with bunch of settings in my ~/.cshrc file). I checked these on my Cygwin shell and it works normally as I expect. Something is wrong with my CentOS shell setting. What could it be?

标签: shell awk
1条回答
我想做一个坏孩纸
2楼-- · 2019-07-25 21:43

This is a problem with DOS line endings (as noted by Etan Reisner in the comments above). Your second version of in.dat uses \r\n for line breaks and awk can't deal with that.

Using your same tt.awk code:

$ echo "SIGNAL Hello1\!\nSIGNAL Hello2\!" |awk -f tt.awk
ab SIGNAL Hello1! SIGNAL Hello2!
$ echo "SIGNAL Hello1\!\r\nSIGNAL Hello2\!" |awk -f tt.awk
 SIGNAL Hello2!1!

Wondering what this is really doing? In UNIX, \r resets the position in line to the leftmost place but does not send you down a line (that's what \n does). DOS interprets \n as going down a line but not resetting to the leftmost position while UNIX takes the \r as implicit.

Here are some experiments to illustrate what's going on:

$ echo "SIGNAL Hello1\!\r\nSIGNAL Hello2\!"
SIGNAL Hello1!
SIGNAL Hello2!
$ echo "SIGNAL Hello1\!\rSIGNAL Hello2\!"
SIGNAL Hello2!
$ echo "ab SIGNAL Hello1\!\n SIGNAL Hello2\!"
ab SIGNAL Hello1!
 SIGNAL Hello2!
$ echo "ab SIGNAL Hello1\!\r SIGNAL Hello2\!"
 SIGNAL Hello2!1!

Pay special attention to the last two items. awk strips the \n for you, but retains the \r, so the first line prints as ab SIGNAL Hello1! and then the \r is applied, and the second line signal Hello2! is written on top of that first line. The first line's final two characters (1!) remain because the second line wasn't long enough to overwrite them.

Now that we know the issue, we can fix the code:

BEGIN    { a = "a"; b = "b"; c = a b; }
/SIGNAL/ { gsub(/\r/, ""); c = c " " $0; }
END      { print c; }

This removes all \rs from lines that are added to c.

查看更多
登录 后发表回答