I want to measure the time of execution of an external program whose output is used by my Python script.
Calling extprogram
the program that produced the output, at the moment I do something like:
import time
import subprocess
def process_output(line):
...
...
return processed_data
all_processed_data = []
ts = time.time()
p = subprocess.Popen("extprogram", stdout=subprocess.PIPE)
for line in p.stdout:
all_processed_data.append(process_output(line))
te = time.time()
elapsed_time = te - ts
This doesn't work as intended because what I am measuring is the time of execution of extprogram
plus the time required to process its output.
extprogram
produces a large amount of data, therefore I would like to "stream" its output in my Python program using a cycle as I am doing now.
How can I evaluate te
when extprogram
terminates rather than waiting for all the output to be processed?
Since you are under Unix, you can use the time
command. Here is the principle:
import sys
import subprocess
p = subprocess.Popen(["time", "ls"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
for line in p.stdout: # ls output
sys.stdout.write(line)
time_output = p.stderr.readlines()
print "Stderr:", ''.join(time_output)
On my machine, this gives:
Stderr: 0.01 real 0.00 user 0.00 sys
The total processor time is the user
+ sys
time (real
is the wall clock time, which does not generally represent how much processor time the program used: for instance, with sleep 5
, the real
time is 5 seconds, while the user
and sys
times are 0).
This works because time
outputs a detailed accounting of the real execution time (not simply the wall time, which depends on what other processes are running, etc.), and does so to the standard error output. You can parse the standard error and get the timing information.
This method might not be practical if you program outputs data to the standard error that might interfere with the parsing of the time
command.
Also, I haven't checked that no deadlock can happen with the above code (I'm not sure what would happen if the program called prints a lot to the standard error: could the program block until the standard error buffer is read, which may not happen if the Python program is reading the standard output?). That said, if you know that the timed program has no or little data on its standard error, I believe that the code above will not deadlock.
The following still uses 'wall clock' time but may be an alternative to the use of host system time commands. The execution and the timing are split into separate threads and the timer can be stopped before any processing is carried out.
from multiprocessing import Event
import threading
import time
import subprocess
def timing(event):
print "timer starts"
ts = time.time()
event.wait()
te = time.time()
elapsed_time = te - ts
print "Elapsed Time " + str(elapsed_time)
def execution(event):
for i in range(0,1000):
p = subprocess.Popen("ls", stdout=subprocess.PIPE)
event.set()
if __name__ == '__main__':
event = Event()
e = threading.Thread(target=execution, args=(event,))
t = threading.Thread(target=timing, args=(event,))
t.start()
e.start()
while not event.is_set():
print "running..."
time.sleep(1)
This gives me the following output:
timer starts
running...
running...
Elapsed Time 1.66236400604
Or you could split receiving the output of 'extprogram' from the processing of the output.
For example:
ts = time.time()
p = subprocess.Popen("extprogram", stdout=subprocess.PIPE)
for line in p.stdout:
tempdata.append(line)
te = time.time()
elapsed_time = te - ts
for line in tempdata:
all_processed_data.append(process_output(line))