Bash command groups: Why do curly braces require a

2020-02-09 06:45发布

问题:

I know the difference in purpose between parentheses () and curly braces {} when grouping commands in bash.

But why does the curly brace construct require a semicolon after the last command, whereas for the parentheses construct, the semicolon is optional?

$ while false; do ( echo "Hello"; echo "Goodbye"; ); done
$ while false; do ( echo "Hello"; echo "Goodbye" ); done
$ while false; do { echo "Hello"; echo "Goodbye"; }; done
$ while false; do { echo "Hello"; echo "Goodbye" }; done
bash: syntax error near unexpected token `done'
$ 

I'm looking for some insight as to why this is the case. I'm not looking for answers such as "because the documentation says so" or "because it was designed that way". I'd like to know why it was designed this is way. Or maybe if it is just a historical artifact?

This may be observed in at least the following versions of bash:

  • GNU bash, version 3.00.15(1)-release (x86_64-redhat-linux-gnu)
  • GNU bash, version 3.2.48(1)-release (x86_64-apple-darwin12)
  • GNU bash, version 4.2.25(1)-release (x86_64-pc-linux-gnu)

回答1:

Because { and } are only recognized as special syntax if they are the first word in a command.


There are two important points here, both of which are found in the definitions section of the bash manual. First, is the list of metacharacters:

metacharacter

A character that, when unquoted, separates words. A metacharacter is a blank or one of the following characters: ‘|’, ‘&’, ‘;’, ‘(’, ‘)’, ‘<’, or ‘>’.

That list includes parentheses but not braces (neither curly nor square). Note that it is not a complete list of characters with special meaning to the shell, but it is a complete list of characters which separate tokens. So { and } do not separate tokens, and will only be considered tokens themselves if they are adjacent to a metacharacter, such as a space or a semi-colon.

Although braces are not metacharacters, they are treated specially by the shell in parameter expansion (eg. ${foo}) and brace expansion (eg. foo.{c,h}). Other than that, they are just normal characters. There is no problem with naming a file {ab}, for example, or }{, since those words do not conform to the syntax of either parameter expansion (which requires a $ before the {) or brace expansion (which requires at least one comma between { and }). For that matter, you could use { or } as a filename without ever having to quote the symbols. Similarly, you can call a file if, done or time without having to think about quoting the name.

These latter tokens are "reserved words":

reserved word

A word that has a special meaning to the shell. Most reserved words introduce shell flow control constructs, such as for and while.

The bash manual doesn't contain a complete list of reserved words, which is unfortunate, but they certainly include the Posix-designated:

!    {    }
case do   done elif else
esac fi   for  if   in
then until while

as well as the extensions implemented by bash (and some other shells):

[[   ]]
function  select time

These words are not the same as built-ins (such as [), because they are actually part of the shell syntax. The built-ins could be implemented as functions or shell scripts, but reserved words cannot because they change the way that the shell parses the command line.

There is one very important feature of reserved words, which is not actually highlighted in the bash manual but is made very explicit in Posix (from which the above lists of reserved words were taken, except for time):

This recognition [as a reserved word] shall only occur when none of the characters is quoted and when the word is used as:

  • The first word of a command …

(The full list of places where reserved words is recognized is slightly longer, but the above is a pretty good summary.) In other words, reserved words are only reserved when they are the first word of a command. And, since { and } are reserved words, they are only special syntax if they are the first word in a command.

Example:

ls }  # } is not a reserved word. It is an argument to `ls`
ls;}  # } is a reserved word; `ls` has no arguments

There is lots more I could write about shell parsing, and bash parsing in particular, but it would rapidly get tedious. (For example, the rule about when # starts a comment and when it is just an ordinary character.) The approximate summary is: "don't try this at home"; really, the only thing which can parse shell commands is a shell. And don't try to make sense of it: it's just a random collection of arbitrary choices and historical anomalies, many but not all based on the need to not break ancient shell scripts with new features.