Tee with process substitution misunderstanding

2019-07-11 07:29发布

问题:

I am trying to write a pretty printer for LDAP entries which only fetches the root LDAP record once and then pipes the output into tee that invokes the pretty printer for each section.

For illustration's sake, say my group_entry function returns the LDIF of a specific LDAP DN. The details of which aren't important, so let's say it always returns:

dn: cn=foo,dc=example,dc=com
cn: foo
owner: uid=foo,dc=example,dc=com
owner: uid=bar,dc=example,dc=com
member: uid=foo,dc=example,dc=com
member: uid=baz,dc=example,dc=com
member: uid=quux,dc=example,dc=com
custom: abc123

I can easily extract the owners and members separately with a bit of grep'ing and cut'ing. I can then pipe those secondary DNs into another LDAP search query to get their real names. For sake of example, let's say I have a pretty_print function, that is parametrised on the LDAP attribute name, which does all that I just mentioned and then formats everything nicely with AWK:

$ group_entry | pretty_print owner
Owners:
foo    Mr Foo
bar    Dr Bar

$ group_entry | pretty_print member
Members:
foo    Mr Foo
baz    Bazzy McBazFace
quux   The Artist Formerly Known as Quux

These work fine individually, but when I try to tee them together, nothing happens:

$ group_entry | tee >(pretty_print owner) | pretty_print member
Members:
[Sits there waiting for Ctrl+C]

Obviously I have some misunderstanding about how this is supposed to work, but it escapes me. What am I doing wrong?


EDIT For sake of completeness, here's my full script:

#!/usr/bin/env bash

set -eu -o pipefail

LDAPSEARCH="ldapsearch -xLLL"

group_entry() {
  local group="$1"
  ${LDAPSEARCH} "(&(objectClass=posixGroup)(cn=${group}))"
}

get_attribute() {
  local attr="$1"
  grep "${attr}:" | cut -d" " -f2
}

get_names() {
  # We strip blank lines out of the LDIF entry, then we always have "dn"
  # followed by "cn" records; we strip off the attribute name and
  # concatenate those lines, then sort. So we get a sorted list of:
  # {{distinguished_name}} {{real_name}}
  xargs -n1 -J% ${LDAPSEARCH} -s base -b % cn \
  | grep -v "^$" \
  | cut -d" " -f2- \
  | paste - - \
  | sort
}

pretty_print() {
  local attr="$1"
  local -A pretty=([member]="Members" [owner]="Owners")

  get_attribute "${attr}" \
  | get_names \
  | gawk -F'\t' -v title="${pretty[${attr}]}:" '
    BEGIN { print title }
    { print "-", gensub(/^uid=([^,]+),.*$/, "\\1", "g", $1), "\t", $2 }
  '
}

# FIXME I don't know why tee with process substitution doesn't work here
group_entry "$1" | pretty_print owner
group_entry "$1" | pretty_print member

回答1:

The behavior you describe looks very much like a situation that can arise in a C program that forks and exec's another program (as the shell and xargs both certainly do) without properly handling all the open file descriptors. You can be left in a situation where a process p1 does not terminate because it's waiting to observe EOF on its standard input, but it never will do because another process p2 holds an open file descriptor for the write end of the pipe that provides p1's standard input, and p2 is itself waiting for p1 to terminate or perform some other action.

Nevertheless, I don't see anything inherently wrong with your pipeline in that regard, and I do not reproduce the hang with this simpler model ...

echo "foo" | tee >(cat) | cat

... in version 4.2.46 of bash. It may be that there is nevertheless a related bug in your version of bash (even if its the same one) or in xargs, but that's speculative. I do not think that your pipeline should hang as you say it does, but I'm not prepared to start pointing fingers.

In any event, even if your pipeline did not hang, it does not have the semantics you want, as @chepner pointed out in comments. The pretty_print member will receive the output of tee on its standard input, and that will include both the output of group_entry and the output of pretty_print owner. You could consider implementing it differently: since tee can multiplex input more than two ways, you may kill two birds with one stone by doing this:

group_entry "$1" | tee >(pretty_print owner) >(pretty_print member)

But that leaves open the possibility that the output of the two pretty_print executions will be intermingled, and also echos the group_entry output. You could conceivably filter out the group_entry output, but to avoid the intermingling, you need to ensure that the two pretty_print commands run sequentially. That presents a problem for a tee-based approach, because if any of tee's outputs block then the whole pipeline can stall.

One solution would be to redirect the output of one or both pretty_print commands to a file. Alternatively, if it is essential that both outputs go to stdout, then I see no good alternative but to capture the group_entry output, and feed it separately to each pretty_print job. You could capture it to a file, but that's unnecessary, and a bit messy. Consider this instead:

entry_lines=$(group_entry "$1")
pretty_print owner  <<<"$entry_lines"
pretty_print member <<<"$entry_lines"

That uses command substitution to capture the output of group_entry in a shell variable (including newlines), and uses a here string to replay it into each pretty_print process.