bash: piping output from a loop seems to change th

I've noticed that variable scope within a bash for loop seems to change if I pipe the output of the loop.

For example, here g remains changed after the loop:

$ g=bing; for f in foo; do g=fing; echo g in loop: $g; done; echo g after $g;
g in loop: fing
g after fing

whereas here, the change during the loop is forgotten:

$ g=bing; for f in foo; do g=fing; echo g in loop: $g; done | cat; echo g after $g;
g in loop: fing
g after bing

The value of g in the receiver of the pipe is from the "outer" context too:

$ g=bing; for f in foo; do g=fing; echo g in loop: $g; done | (cat; echo in pipe $g;); echo g after $g;
g in loop: fing
in pipe bing
g after bing

What's going on?

回答1:

From the bash man page

Each command in a pipeline is executed as a separate process (i.e., in a subshell).

This means that both sides of the pipeline are run in a subshell.

From http://www.tldp.org/LDP/abs/html/subshells.html

Variables in a subshell are not visible outside the block of code in the subshell. They are not accessible to the parent process, to the shell that launched the subshell. These are, in effect, variables local to the child process.

This means that when the pipeline ends all changes to variables are lost.

Here is a proof of concept for this theory using BASH_SUBSHELL

BASH_SUBSHELL Incremented by one each time a subshell or subshell environment is spawned. The initial value is 0.

Input:

echo "before loop:$BASH_SUBSHELL"
for i in foo; do echo "in loop:$BASH_SUBSHELL"; done | (cat;echo "second pipe: $BASH_SUBSHELL")
echo "out of pipe: $BASH_SUBSHELL"

Output:

before loop:0
in loop:1
second pipe: 1
out of pipe: 0

As you can see both inside the loop and the second part of the pipe have been run inside subshells, and they end at the end of the pipeline.

Edit 2

Realised it was probably clearer to do this to show the different subshells that are run

Bash <4.0

In old bashes it doesn't include $BASHPID which is really the only way to see the pid of subshells, but you can declare a function like

GetPid(){ cut -d " " -f 4 /proc/self/stat; }

which works pretty much the same

echo -n "before loop:";GetPid
for i in foo; do echo -n "in loop:";GetPid; done | (cat;echo -n "second pipe:";GetPid)
echo -n "out of pipe:";GetPid

Bash 4.0+

 echo "before loop:$BASHPID"
 for i in foo; do echo "in loop:$BASHPID"; done | (cat;echo "second pipe: $BASHPID")
 echo "out of pipe: $BASHPID"

Output:

before loop:29985
in loop:12170
second pipe:12171
out of pipe:29985

As you can see this makes it clearer that before and after the pipeline you are in the same shell with the original variable.
Your third case is also solved as both sides of the pipe run in different subshells the variable is reset to the parent value for each piped command, so will have reverted back after the loop, even though it is still the same pipeline.

回答2:

As soon as you use a pipe (|) subshells are involved, mostly on both sides of the pipe.

Therefore the for loop runs in a subshell and sets the variable inside that subshell. That's why after the loop the variable value stayed.

In your first example there is no subshell, just multiple commands executed after each other.