Yesterday it was suggested to me that using command substitution in bash causes an unnecessary subshell to be spawned. The advice was specific to this use case:
# Extra subshell spawned
foo=$(command; echo $?)
# No extra subshell
command
foo=$?
As best I can figure this appears to be correct for this use case. However, a quick search trying to verify this leads to reams of confusing and contradictory advice. It seems popular wisdom says ALL usage of command substitution will spawn a subshell. For example:
The command substitution expands to the output of commands. These commands are executed in a subshell, and their stdout data is what the substitution syntax expands to. (source)
This seems simple enough unless you keep digging, on which case you'll start finding references to suggestions that this is not the case.
Command substitution does not necessarily invoke a subshell, and in most cases won't. The only thing it guarantees is out-of-order evaluation: it simply evaluates the expressions inside the substitution first, then evaluates the surrounding statement using the results of the substitution. (source)
This seems reasonable, but is it true? This answer to a subshell related question tipped me off that man bash
has this to note:
Each command in a pipeline is executed as a separate process (i.e., in a subshell).
This brings me to the main question. What, exactly, will cause command substitution to spawn a subshell that would not have been spawned anyway to execute the same commands in isolation?
Please consider the following cases and explain which ones incur the overhead of an extra subshell:
# Case #1
command1
var=$(command1)
# Case #2
command1 | command2
var=$(command1 | command2)
# Case #3
command1 | command 2 ; var=$?
var=$(command1 | command2 ; echo $?)
Do each of these pairs incur the same number of subshells to execute? Is there a difference in POSIX vs. bash implementations? Are there other cases where using command substitution would spawn a subshell where running the same set of commands in isolation would not?
Update and caveat:
This answer has a troubled past in that I confidently claimed things that turned out not to be true. I believe it has value in its current form, but please help me eliminate other inaccuracies (or convince me that it should be deleted altogether).
I've substantially revised - and mostly gutted - this answer after @kojiro pointed out that my testing methods were flawed (I originally used
ps
to look for child processes, but that's too slow to always detect them); a new testing method is described below.I originally claimed that not all bash subshells run in their own child process, but that turns out not to be true.
As @kojiro states in his answer, some shells - other than bash - DO sometimes avoid creation of child processes for subshells, so, generally speaking in the world of shells, one should not assume that a subshell implies a child process.
As for the OP's cases in bash (assumes that
command{n}
instances are simple commands):It looks like using command substitution (
$(...)
) always adds an extra subshell in bash - as does enclosing any command in(...)
.I believe, but am not certain these results are correct; here's how I tested (bash 3.2.51 on OS X 10.9.1) - please tell me if this approach is flawed:
fork()
calls in the 1st withsudo dtruss -t fork -f -p {pidOfShell1}
(the-f
is necessary to also tracefork()
calls "transitively", i.e. to include those created by subshells themselves).Used only the builtin
:
(no-op) in the test commands (to avoid muddling the picture with additionalfork()
calls for external executables); specifically::
$(:)
: | :
$(: | :)
: | :; :
$(: | :; :)
Only counted those
dtruss
output lines that contained a non-zero PID (as each child process also reports thefork()
call that created it, but with PID 0).fork()
.Below is what I still believe to be correct from my original post: when bash creates subshells.
bash creates subshells in the following situations:
(...)
)[[ ... ]]
, where parentheses are only used for logical grouping.|
), including the first oneThus, modifications of subshells in earlier pipeline segments do not affect later ones.
(By design, commands in a pipeline are launched simultaneously - sequencing only happens through their connected stdin/stdout pipes.)
bash 4.2+
has shell optionlastpipe
(OFF by default), which causes the last pipeline segment NOT to run in a subshell.for command substitution (
$(...)
)for process substitution (
<(...)
)exec
(<(exec ...)
).&
)Combining these constructs will result in more than one subshell.
In Bash, a subshell always executes in a new process space. You can verify this fairly trivially in Bash 4, which has the
$BASHPID
and$$
environment variables:in practice:
About the only case where the shell can elide an extra subshell is when you pipe to an explicit subshell:
Here, the subshell implied by the pipe is explicitly applied, but not duplicated.
This varies from some other shells that try very hard to avoid
fork
-ing. Therefore, while I feel the argument made injs-shell-parse
misleading, it is true that not all shells alwaysfork
for all subshells.