Using bash process substitution, I want to run two different commands on a file simultaneously. In this example it is not necessary but imagine that "cat /usr/share/dict/words" was a very expensive operation such as uncompressing a 50gb file.
cat /usr/share/dict/words | tee >(head -1 > h.txt) >(tail -1 > t.txt) > /dev/null
After this command I would expect h.txt to contain the first line of the words file "A", and t.txt to contain the last line of the file "Zyzzogeton".
However what actually happens is that h.txt contains "A" but t.txt contains "argillaceo" which is about 5% into the file.
Why does this happen? It seems like either the "tail" process is terminating early or the streams are getting mixed up.
Running another similar command like this behaves as expected:
cat /usr/share/dict/words | tee >(grep ^a > a.txt) >(grep ^z > z.txt) > /dev/null
After this command I'd expect a.txt to contain all the words that begin with "a", while z.txt contains all of the words that begin with "z", which is exactly what happened.
So why doesn't this work with "tail", and with what other commands will this not work?
Ok, what seems to happen is that once the head -1
command finishes it exits and that causes tee
to get a SIGPIPE it tries to write to the named pipe that the process substitution setup which generates an EPIPE
and according to man 2 write
will also generate SIGPIPE
in the writing process, which causes tee
to exit and that forces the tail -1
to exit immediately, and the cat
on the left gets a SIGPIPE
as well.
We can see this a little better if we add a bit more to the process with head
and make the output both more predictable and also written to stderr
without relying on the tee
:
for i in {1..30}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done") >(tail -1 > t.txt) >/dev/null
which when I run it gave me the output:
1
Head done
2
so it got just 1 more iteration of the loop before everything exited (though t.txt
still only has 1
in it). If we then did
echo "${PIPESTATUS[@]}"
we see
141 141
which this question ties to SIGPIPE
in a very similar fashion to what we're seeing here.
The coreutils maintainers have added this as an example to their tee
"gotchas" for future posterity.
For a discussion with the devs about how this fits into POSIX compliance you can see the (closed notabug) report at http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22195
If you have access to GNU version 8.24 they have added some options (not in POSIX) that can help like -p
or --output-error=warn
. Without that you can take a bit of a risk but get the desired functionality in the question by trapping and ignoring SIGPIPE:
trap '' PIPE
for i in {1..30}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done") >(tail -1 > t.txt) >/dev/null
trap - PIPE
will have the expected results in both h.txt
and t.txt
, but if something else happened that wanted SIGPIPE to be handled correctly you'd be out of luck with this approach.
Another hacky option would be to zero out t.txt
before starting then not let the head
process list finish until it is non-zero length:
> t.txt; for i in {1..10}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done"; while [ ! -s t.txt ]; do sleep 1; done) >(tail -1 > t.txt; date) >/dev/null