How can I split and re-join STDOUT from multiple p

2020-06-17 04:11发布

问题:

I am working on a pipeline that has a few branch points that subsequently merge-- they look something like this:

         command2
        /        \
command1          command4
        \        /
         command3

Each command writes to STDOUT and accepts input via STDIN. STDOUT from command1 needs to be passed to both command2 and command3, which are run sequentially, and their output needs to be effectively concatenated and passed to command4. I initially thought that something like this would work:

$ command1 | (command2; command3) | command4

That doesn't work though, as only STDOUT from command2 is passed to command 4, and when I remove command4 it's apparent that command3 isn't being passed the appropriate stream from command1 -- in other words, it's as if command2 is exhausting or consuming the stream. I get the same result with { command2 ; command3 ; } in the middle as well. So I figured I should be using 'tee' with process substitution, and tried this:

$ command1 | tee >(command2) | command3 | command4

But surprisingly that didn't work either -- it appears that the output of command1 and the output of command2 are piped into command3, which results in errors and only the output of command3 being piped into command4. I did find that the following gets the appropriate input and output to and from command2 and command3:

$ command1 | tee >(command2) >(command3) | command4

However, this streams the output of command1 to command4 as well, which leads to issues as command2 and command3 produce a different specification than command1. The solution I've arrived on seems hacky, but it does work:

$ command1 | tee >(command2) >(command3) > /dev/null | command4

That suppresses command1 passing its output to command4, while collecting STDOUT from command2 and command3. It works, but I feel like I'm missing a more obvious solution. Am I? I've read dozens of threads and haven't found a solution to this problem that works in my use case, nor have I seen an elaboration of the exact problem of splitting and re-joining streams (though I can't be the first one to deal with this). Should I just be using named pipes? I tried but had difficulty getting that working as well, so maybe that's another story for another thread. I'm using bash in RHEL5.8.

回答1:

You can play around with file descriptors like this;

((date | tee >( wc >&3) | wc) 3>&1) | wc

or

((command1 | tee >( command2 >&3) | command3) 3>&1) | command4

To explain, that is tee >( wc >&3) will output the original data on stdout, and the inner wc will output the result on FD 3. The outer 3>&1) will then merge FD3 output back into STDOUT so output from both wc is sent to the tailing command.

HOWEVER, there is nothing in this pipeline (or the one in your own solution) which will guanrantee that the output will not be mangled. That is incomplete lines from command2 will not be mixed up with lines of command3 -- if that is a concern, you will need to do one of two things;

  1. Write your own tee program which internally uses popen and read each line back before sending complete lines to stdout for command4 to read
  2. Write the output from command2 and command3 to a file and use cat to merge the data as input to command4


回答2:

Please see also https://unix.stackexchange.com/questions/28503/how-can-i-send-stdout-to-multiple-commands. Amongst all answers, I found this answer particularly fits my need.

Expand a little bit @Soren's answer,

$ ((date | tee >( wc >&3) | wc) 3>&1) | cat -n
     1         1       6      29
     2         1       6      29

You can do without using tee but an environment variable,

$ (z=$(date); (echo "$z"| wc ); (echo "$z"| wc) ) | cat -n
     1         1       6      29
     2         1       6      29

In my case, I applied this technique and wrote a much complex script that runs under busybox.



回答3:

I believe your solution is good and it uses tee as documented. If you read manpage of tee, it says:

Copy standard input to each FILE, and also to standard output

Your files are process substitutions.

And the standard output is what you need to remove, because you don't want it, and that's what you did with redirecting it to /dev/null