bash functions: enclosing the body in braces vs. p

2019-03-08 19:23发布

问题:

Usually, bash functions are defined using curly braces to enclose the body:

foo()
{
    ...
}

When working on a shell script today making extensive use of functions, I've run into problems with variables that have the same name in the called as in the calling function, namely that those variables are the same. I've then found out that this can be prevented by defining the local variables inside the function as local: local var=xyz.

Then, at some point, I've discovered a thread (Defining bash function body using parenthesis instead of braces) in which it is explained that it's just as valid to define a function using parentheses like this:

foo()
(
    ...
)

The effect of this is that the function body is executed in a subshell, which has the benefit that the function has its own variable scope, which allows me to define them without local. Since having a function-local scope seems to make much more sense and to be much safer than all variables being global, I immediately ask myself:

  • Why are braces used by default to enclose the function body instead of parentheses?

However, I quickly also discovered a major downside to executing the function in a subshell, specifically that exiting the script from inside a function doesn't work anymore, instead forcing me to work with the return status along the whole call tree (in case of nested functions). This leads me to this follow-up question:

  • Are there other major downsides (*) to using parentheses instead of braces (which might explain why braces seem to be preferred)?

(*) I'm aware (from exception-related discussions I've stumbled upon over time) that some would argue that explicitly using the error status is much better than being able to exit from anywhere, but I prefer the latter.

Apparently both styles have their advantages and disadvantages. So I hope some of you more experienced bash users can give me some general guidance:

  • When shall I use curly braces to enclose the function body, and when is it advisable to switch to parentheses?

EDIT: Take-aways from the answers

Thanks for your answers, my head's now a bit clearer with regards to this. So what I take away from the answers is:

  • Stick to the conventional curly braces, if only in order not to confuse potential other users/developers of the script (and even use the braces if the whole body is wrapped in parentheses).

  • The only real disadvantage of the curly braces is that any variable in the parent scope can be changed, although in some situations this might be an advantage. This can easily be circumvented by declaring the variables as local.

  • Using parentheses, on the other hand, might have some serious unwanted effects, such as messing up exits, leading to problems with killing a script, and isolating the variable scope.

回答1:

Why are braces used by default to enclose the function body instead of parentheses?

The body of a function can be any compound command. This is typically { list; }, but three other forms of compound commands are technically allowed: (list), ((expression)), and [[ expression ]].

C and languages in the C family like C++, Java, C#, and JavaScript all use curly braces to delimit function bodies. Curly braces are the most natural syntax for programmers familiar with those languages.

Are there other major downsides (*) to using parentheses instead of braces (which might explain why braces seem to be preferred)?

Yes. There are numerous things you can't do from a sub-shell, including:

  • Change global variables. Variables changes will not propagate to the parent shell.
  • Exit the script. An exit statement will exit only the sub-shell.

Starting a sub-shell can also be a serious performance hit. You're launching a new process each time you call the function.

You might also get weird behavior if your script is killed. The signals the parent and child shells receive will change. It's a subtle effect but if you have trap handlers or you kill your script those parts not work the way you want.

When shall I use curly braces to enclose the function body, and when is it advisable to switch to parentheses?

I would advise you to always use curly braces. If you want an explicit sub-shell, then add a set of parentheses inside the curly braces. Using just parentheses is highly unusual syntax and would confuse many people reading your script.

foo() {
   (
       subshell commands;
   )
}


回答2:

It really matters. Since bash functions do not return values and the variables they used are from the global scope (that is, they can access the variables from "outside" its scope), the usual way to handle the output of a function is to store the value in a variable and then call it.

When you define a function with (), you are right: it will create sub-shell. That sub-shell will contain the same values the original had, but won't be able to modify them. So that you are losing that resource of changing global scope variables.

See an example:

$ cat a.sh
#!/bin/bash

func_braces() { #function with curly braces
echo "in $FUNCNAME. the value of v=$v"
v=4
}

func_parentheses() (
echo "in $FUNCNAME. the value of v=$v"
v=8
)


v=1
echo "v=$v. Let's start"
func_braces
echo "Value after func_braces is: v=$v"
func_parentheses
echo "Value after func_parentheses is: v=$v"

Let's execute it:

$ ./a.sh
v=1. Let's start
in func_braces. the value of v=1
Value after func_braces is: v=4
in func_parentheses. the value of v=4
Value after func_parentheses is: v=4   # the value did not change in the main shell


回答3:

I tend to use a subshell when I want to change directories, but always from the same original directory, and cannot be bothered to use pushd/popd or manage the directories myself.

for d in */; do
    ( cd "$d" && dosomething )
done

This would work as well from a function body, but even if you define the function with curly braces, it is still possible to use it from a subshell.

doit() {
    cd "$1" && dosomething
}
for d in */; do
    ( doit "$d" )
done

Of course, you can still maintain variable scope inside a curly-brace-defined function using declare or local:

myfun() {
    local x=123
}

So I would say, explicitly define your function as a subshell only if not being a subshell is detrimental to the obvious correct behavior of that function.

Trivia: As a side note, consider that bash actually always treats the function as a curly-brace compound command. It just sometimes has parentheses in it:

$ f() ( echo hi )
$ type f
f is a function
f () 
{ 
    ( echo hi )
}