I wan to create a pipe between 3 commands:
cat = subprocess.Popen("cat /etc/passwd", stdout=subprocess.PIPE)
grep = subprocess.Popen("grep '<usernamr>'", stdin=cat.stdout, stdout=subprocess.PIPE)
cut = subprocess.Popen("cut -f 3 -d ':'", stdin=grep.stdout, stdout=subprocess.PIPE)
for line in cut.stdout:
# process each line here
But python documentation says:
Use communicate()
rather than .stdin.write
, .stdout.read
or
.stderr.read
to avoid deadlocks due to any of the other OS pipe
buffers filling up and blocking the child process.
then how should I use cut.stdout
? Can someone explain documentation?
The external process you've spawned may block forever if you are using process.stdin.write
without any awareness of possible buffering issues. For example, if the process responds to your 1-line input by writing to its stdout a large (say, 10-100MB) amount of data and you continue to write to its stdin while not receiving this data, than the process will become blocked on write to stdout (stdout is an unnamed pipe and the OS maintains buffers of a particular size for them).
You can try the iterpipes library that deals with these issues by running input and ouput tasks as separate threads.
communicate
is designed to prevent a deadlock that wouldn't occur in your application anyway: it is there primarily for the situation where both stdin
and stdout
on a Popen
object are pipes to the calling process, i.e.
subprocess.Popen(["sometool"], stdin=subprocess.PIPE, stdout=subprocess.PIPE)
In your case, you can safely read from cut.stdout
. You may use communicate
if you find it convenient, but you don't need to.
(Note that subprocess.Popen("/etc/passwd")
doesn't make sense; you seem to have forgotten a cat
. Also, don't forget shell=True
.)