How to capture streaming output in python from sub

Currently, I have something like this:

self.process = subprocess.Popen(self.cmd, stdout=subprocess.PIPE)
out, err = self.process.communicate()

The command I'm running streams the output, and I need the process to block before continuing.

How do I make it so that I can capture the streaming output AND have the streaming output printing through stdout? When I set stdout=subprocess.PIPE, I can capture the output, but it won't print the output. If I leave out stdout=subprocess.PIPE, it prints the output, but communicate() will return None.

Is there a solution that would do what I'm asking for WHILE providing blocking until the process is terminated/completed AND avoid buffer issues and pipe deadlock issues mentioned here?

Thanks!

回答1:

I can think of a few solutions.

#1: You can just go into the source to grab the code for communicate, copy and paste it, adding in code that prints each line as it comes in as well as buffering things up. (If its possible for your own stdout to block because of, say, a deadlocked parent, you can use a threading.Queue or something instead.) This is obviously a bit hacky, but it's pretty easy, and will be safe.

But really, communicate is complicated because it needs to be fully general, and handle cases you don't. All you need here is the central trick: throw threads at the problem. A dedicated reader thread that doesn't do anything slow or blocking between read calls is all you need.

Something like this:

self.process = subprocess.Popen(self.cmd, stdout=subprocess.PIPE)
lines = []
def reader():
    for line in self.process.stdout:
        lines.append(line)
        sys.stdout.write(line)
t = threading.Thread(target=reader)
t.start()
self.process.wait()
t.join()

You may need some error handling in the reader thread. And I'm not 100% sure you can safely use readline here. But this will either work, or be close.

#2: Or you can create a wrapper class that takes a file object and tees to stdout/stderr every time anyone reads from it. Then create the pipes manually, and pass in wrapped pipes, instead of using the automagic PIPE. This has the exact same issues as #1 (meaning either no issues, or you need to use a Queue or something if sys.stdout.write can block).

Something like this:

class TeeReader(object):
    def __init__(self, input_file, tee_file):
        self.input_file = input_file
        self.tee_file = tee_file
    def read(self, size=-1):
        ret = self.input_file.read(size)
        if ret:
            self.tee_file.write(ret)
        return ret

In other words, it wraps a file object (or something that acts like one), and acts like a file object. (When you use PIPE, process.stdout is a real file object on Unix, but may just be something that acts like on on Windows.) Any other methods you need to delegate to input_file can probably be delegated directly, without any extra wrapping. Either try this and see what methods communicate gets AttributeExceptions looking for and code those those explicitly, or do the usual __getattr__ trick to delegate everything. PS, if you're worried about this "file object" idea meaning disk storage, read Everything is a file at Wikipedia.

#3: Finally, you can grab one of the "async subprocess" modules on PyPI or included in twisted or other async frameworks and use that. (This makes it possible to avoid the deadlock problems, but it's not guaranteed—you still have to make sure to services the pipes properly.)

回答2:

The output goes to your calling process, essentially capturing the stdout from self.cmd, so that output does not go anywhere else.

What you need to do is to print it from the 'parent' process if you want to see the output.