backslash in gawk fields

2019-06-10 09:00发布

I've just been made into checking all my output files with gawk which I avoid as much as I can. How does

gawk 'NF \!= 6' file

differ from

gawk 'NF != 6' file 

that is, how does the backslash change the meaning of this expression?

Should it output lines with number of fields different than 6 and ending with backslash?

I'm getting the following error on my files:

gawk:    ^ backslash not last character on line

Anybody?

标签: bash awk gawk
4条回答
混吃等死
2楼-- · 2019-06-10 09:16

If you're trying to match lines that don't have 6 fields and that do end in a backslash, this is one way to do that:

gawk -v 'patt=\\\\$' 'NF != 6 && $0 ~ patt' file

Gawk (and other AWKs) have some complex rules regarding backslash escaping. That's why their are four backslashes in the preceding command. (The dollar sign represents the end of the input line from the data file as in any regex.)

查看更多
混吃等死
3楼-- · 2019-06-10 09:21

If you use double quotes instead of single quotes then ! is a special character and should be escaped with a backslash. Importantly, you are escaping the exclamation point so that your shell does not see it.

gawk "NF \!= 6" file

Within double quotes the shell will convert \! to ! before passing the argument to gawk. The backslash is gone by the time gawk is invoked.

With single qutoes, though, the shell will ignore ! characters, so there's no need to escape them with backslashes. In fact, as you found out it is a syntax error to do so since the backslash ends up being passed to gawk, which barfs on the unexpected \.

查看更多
乱世女痞
4楼-- · 2019-06-10 09:33

The line without the backslash works as expected. However, if you want to know, backslash is used usually to scape special characters (they lose their special meaning and are used as themselves), and also to split long lines, so you could write something like (under a shell):

$ gawk 'NF \
!= 6' file

and it would have the same effect.

Your example in particular is a little bit more tricky. You put the string within single quotes. This makes the shell not to modify what you write, and pass it to the program. If you use your backslash expression, gawk will find a '\' in a place where it has no meaning (in gawk it is only used to split long lines and to scape characters in strings). In the example I wrote with a backslash in two lines, gawk receives two lines split by a backslash (conceptually one line).

查看更多
我想做一个坏孩纸
5楼-- · 2019-06-10 09:33

Whether you use double or single quotes, if you are using a Bourne-like shell, gawk will see the program exactly as it appears between the quotes. Even in double quotes, both Bourne and csh-like shells only consume \ before characters that might need escaping (like $, and in the case of csh, ! - thus in csh this program would appear syntactically correct to gawk, though it still wouldn't do what you want).

! has no meaning to gawk in this context, so it gives an error. To "output lines with number of fields different than 6 and ending with backslash", use:

gawk 'NF != 6 && /\\$/' file

That is: match lines that don't have 6 fields, and which match \ immediately preceding end of line ($). The \ must be escaped with another backslash, because gawk too uses \ for escaping - though in the case of gawk, all \ (except those escaped by another \) are absorbed; those that don't escape a special character are simply elided.

With no associated action, the default action (print the line) will be taken when this conditional statement is satisfied.

查看更多
登录 后发表回答