Currently, I have something like this:
self.process = subprocess.Popen(self.cmd, stdout=subprocess.PIPE)
out, err = self.process.communicate()
The command I'm running streams the output, and I need the process to block before continuing.
How do I make it so that I can capture the streaming output AND have the streaming output printing through stdout? When I set stdout=subprocess.PIPE
, I can capture the output, but it won't print the output. If I leave out stdout=subprocess.PIPE
, it prints the output, but communicate()
will return None
.
Is there a solution that would do what I'm asking for WHILE providing blocking until the process is terminated/completed AND avoid buffer issues and pipe deadlock issues mentioned here?
Thanks!
I can think of a few solutions.
#1: You can just go into the source to grab the code for communicate
, copy and paste it, adding in code that prints each line as it comes in as well as buffering things up. (If its possible for your own stdout
to block because of, say, a deadlocked parent, you can use a threading.Queue
or something instead.) This is obviously a bit hacky, but it's pretty easy, and will be safe.
But really, communicate
is complicated because it needs to be fully general, and handle cases you don't. All you need here is the central trick: throw threads at the problem. A dedicated reader thread that doesn't do anything slow or blocking between read
calls is all you need.
Something like this:
self.process = subprocess.Popen(self.cmd, stdout=subprocess.PIPE)
lines = []
def reader():
for line in self.process.stdout:
lines.append(line)
sys.stdout.write(line)
t = threading.Thread(target=reader)
t.start()
self.process.wait()
t.join()
You may need some error handling in the reader
thread. And I'm not 100% sure you can safely use readline
here. But this will either work, or be close.
#2: Or you can create a wrapper class that takes a file object and tees to stdout
/stderr
every time anyone read
s from it. Then create the pipes manually, and pass in wrapped pipes, instead of using the automagic PIPE
. This has the exact same issues as #1 (meaning either no issues, or you need to use a Queue
or something if sys.stdout.write
can block).
Something like this:
class TeeReader(object):
def __init__(self, input_file, tee_file):
self.input_file = input_file
self.tee_file = tee_file
def read(self, size=-1):
ret = self.input_file.read(size)
if ret:
self.tee_file.write(ret)
return ret
In other words, it wraps a file object (or something that acts like one), and acts like a file object. (When you use PIPE
, process.stdout
is a real file object on Unix, but may just be something that acts like on on Windows.) Any other methods you need to delegate to input_file
can probably be delegated directly, without any extra wrapping. Either try this and see what methods communicate
gets AttributeException
s looking for and code those those explicitly, or do the usual __getattr__
trick to delegate everything. PS, if you're worried about this "file object" idea meaning disk storage, read Everything is a file at Wikipedia.
#3: Finally, you can grab one of the "async subprocess" modules on PyPI or included in twisted
or other async frameworks and use that. (This makes it possible to avoid the deadlock problems, but it's not guaranteed—you still have to make sure to services the pipes properly.)
The output goes to your calling process, essentially capturing the stdout
from self.cmd
, so that output does not go anywhere else.
What you need to do is to print it from the 'parent' process if you want to see the output.