The Context
I am using the subprocess
module to start a process from python. I want to be able to access the output (stdout, stderr) as soon as it is written/buffered.
- The solution must support Windows 7. I require a solution for Unix systems too but I suspect the Windows case is more difficult to solve.
- The solution should support Python 2.6. I am currently restricted to Python 2.6 but solutions using later versions of Python are still appreciated.
- The solution should not use third party libraries. Ideally I would love a solution using the standard library but I am open to suggestions.
- The solution must work for just about any process. Assume there is no control over the process being executed.
The Child Process
For example, imagine I want to run a python file called counter.py
via a subprocess
. The contents of counter.py
is as follows:
import sys
for index in range(10):
# Write data to standard out.
sys.stdout.write(str(index))
# Push buffered data to disk.
sys.stdout.flush()
The Parent Process
The parent process responsible for executing the counter.py
example is as follows:
import subprocess
command = ['python', 'counter.py']
process = subprocess.Popen(
cmd,
bufsize=1,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
The Issue
Using the counter.py
example I can access the data before the process has completed. This is great! This is exactly what I want. However, removing the sys.stdout.flush()
call prevents the data from being accessed at the time I want it. This is bad! This is exactly what I don't want. My understanding is that the flush()
call forces the data to be written to disk and before the data is written to disk it exists only in a buffer. Remember I want to be able to run just about any process. I do not expect the process to perform this kind of flushing but I still expect the data to be available in real time (or close to it). Is there a way to achieve this?
A quick note about the parent process. You may notice I am using bufsize=0
for line buffering. I was hoping this would cause a flush to disk for every line but it doesn't seem to work that way. How does this argument work?
You will also notice I am using subprocess.PIPE
. This is because it appears to be the only value which produces IO objects between the parent and child processes. I have come to this conclusion by looking at the Popen._get_handles
method in the subprocess
module (I'm referring to the Windows definition here). There are two important variables, c2pread
and c2pwrite
which are set based on the stdout
value passed to the Popen
constructor. For instance, if stdout
is not set, the c2pread
variable is not set. This is also the case when using file descriptors and file-like objects. I don't really know whether this is significant or not but my gut instinct tells me I would want both read and write IO objects for what I am trying to achieve - this is why I chose subprocess.PIPE
. I would be very grateful if someone could explain this in more detail. Likewise, if there is a compelling reason to use something other than subprocess.PIPE
I am all ears.
Method For Retrieving Data From The Child Process
import time
import subprocess
import threading
import Queue
class StreamReader(threading.Thread):
"""
Threaded object used for reading process output stream (stdout, stderr).
"""
def __init__(self, stream, queue, *args, **kwargs):
super(StreamReader, self).__init__(*args, **kwargs)
self._stream = stream
self._queue = queue
# Event used to terminate thread. This way we will have a chance to
# tie up loose ends.
self._stop = threading.Event()
def stop(self):
"""
Stop thread. Call this function to terminate the thread.
"""
self._stop.set()
def stopped(self):
"""
Check whether the thread has been terminated.
"""
return self._stop.isSet()
def run(self):
while True:
# Flush buffered data (not sure this actually works?)
self._stream.flush()
# Read available data.
for line in iter(self._stream.readline, b''):
self._queue.put(line)
# Breather.
time.sleep(0.25)
# Check whether thread has been terminated.
if self.stopped():
break
cmd = ['python', 'counter.py']
process = subprocess.Popen(
cmd,
bufsize=1,
stdout=subprocess.PIPE,
)
stdout_queue = Queue.Queue()
stdout_reader = StreamReader(process.stdout, stdout_queue)
stdout_reader.daemon = True
stdout_reader.start()
# Read standard out of the child process whilst it is active.
while True:
# Attempt to read available data.
try:
line = stdout_queue.get(timeout=0.1)
print '%s' % line
# If data was not read within time out period. Continue.
except Queue.Empty:
# No data currently available.
pass
# Check whether child process is still active.
if process.poll() != None:
# Process is no longer active.
break
# Process is no longer active. Nothing more to read. Stop reader thread.
stdout_reader.stop()
Here I am performing the logic which reads standard out from the child process in a thread. This allows for the scenario in which the read is blocking until data is available. Instead of waiting for some potentially long period of time, we check whether there is available data, to be read within a time out period, and continue looping if there is not.
I have also tried another approach using a kind of non-blocking read. This approach uses the ctypes
module to access Windows system calls. Please note that I don't fully understand what I am doing here - I have simply tried to make sense of some example code I have seen in other posts. In any case, the following snippet doesn't solve the buffering issue. My understanding is that it's just another way to combat a potentially long read time.
import os
import subprocess
import ctypes
import ctypes.wintypes
import msvcrt
cmd = ['python', 'counter.py']
process = subprocess.Popen(
cmd,
bufsize=1,
stdout=subprocess.PIPE,
)
def read_output_non_blocking(stream):
data = ''
available_bytes = 0
c_read = ctypes.c_ulong()
c_available = ctypes.c_ulong()
c_message = ctypes.c_ulong()
fileno = stream.fileno()
handle = msvcrt.get_osfhandle(fileno)
# Read available data.
buffer_ = None
bytes_ = 0
status = ctypes.windll.kernel32.PeekNamedPipe(
handle,
buffer_,
bytes_,
ctypes.byref(c_read),
ctypes.byref(c_available),
ctypes.byref(c_message),
)
if status:
available_bytes = int(c_available.value)
if available_bytes > 0:
data = os.read(fileno, available_bytes)
print data
return data
while True:
# Read standard out for child process.
stdout = read_output_non_blocking(process.stdout)
print stdout
# Check whether child process is still active.
if process.poll() != None:
# Process is no longer active.
break
Comments are much appreciated.
Cheers