I've noticed that variable scope within a bash for loop seems to change if I pipe the output of the loop.
For example, here g
remains changed after the loop:
$ g=bing; for f in foo; do g=fing; echo g in loop: $g; done; echo g after $g;
g in loop: fing
g after fing
whereas here, the change during the loop is forgotten:
$ g=bing; for f in foo; do g=fing; echo g in loop: $g; done | cat; echo g after $g;
g in loop: fing
g after bing
The value of g
in the receiver of the pipe is from the "outer" context too:
$ g=bing; for f in foo; do g=fing; echo g in loop: $g; done | (cat; echo in pipe $g;); echo g after $g;
g in loop: fing
in pipe bing
g after bing
What's going on?
From the bash man page
Each command in a pipeline is executed as a separate process (i.e., in a subshell).
This means that both sides of the pipeline are run in a subshell.
From http://www.tldp.org/LDP/abs/html/subshells.html
Variables in a subshell are not visible outside the block of code in the subshell. They are not accessible to the parent process, to the shell that launched the subshell. These are, in effect, variables local to the child process.
This means that when the pipeline ends all changes to variables are lost.
Here is a proof of concept for this theory using BASH_SUBSHELL
BASH_SUBSHELL
Incremented by one each time a subshell or subshell environment is spawned. The initial value is 0.
Input:
echo "before loop:$BASH_SUBSHELL"
for i in foo; do echo "in loop:$BASH_SUBSHELL"; done | (cat;echo "second pipe: $BASH_SUBSHELL")
echo "out of pipe: $BASH_SUBSHELL"
Output:
before loop:0
in loop:1
second pipe: 1
out of pipe: 0
As you can see both inside the loop and the second part of the pipe have been run inside subshells, and they end at the end of the pipeline.
Edit 2
Realised it was probably clearer to do this to show the different subshells that are run
Bash <4.0
In old bashes it doesn't include $BASHPID which is really the only way to see the pid of subshells, but you can declare a function like
GetPid(){ cut -d " " -f 4 /proc/self/stat; }
which works pretty much the same
echo -n "before loop:";GetPid
for i in foo; do echo -n "in loop:";GetPid; done | (cat;echo -n "second pipe:";GetPid)
echo -n "out of pipe:";GetPid
Bash 4.0+
echo "before loop:$BASHPID"
for i in foo; do echo "in loop:$BASHPID"; done | (cat;echo "second pipe: $BASHPID")
echo "out of pipe: $BASHPID"
Output:
before loop:29985
in loop:12170
second pipe:12171
out of pipe:29985
As you can see this makes it clearer that before and after the pipeline you are in the same shell with the original variable.
Your third case is also solved as both sides of the pipe run in different subshells the variable is reset to the parent value for each piped command, so will have reverted back after the loop, even though it is still the same pipeline.
As soon as you use a pipe (|
) subshells are involved, mostly on both sides of the pipe.
Therefore the for loop runs in a subshell and sets the variable inside that subshell. That's why after the loop the variable value stayed.
In your first example there is no subshell, just multiple commands executed after each other.