可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I want to measure the time of execution of an external program whose output is used by my Python script.

Calling extprogram the program that produced the output, at the moment I do something like:

import time
import subprocess

def process_output(line):
   ...
   ...
   return processed_data

all_processed_data = []

ts = time.time()
p = subprocess.Popen("extprogram", stdout=subprocess.PIPE)

for line in p.stdout:
    all_processed_data.append(process_output(line))
te = time.time()
elapsed_time = te - ts

This doesn't work as intended because what I am measuring is the time of execution of extprogram plus the time required to process its output.

extprogram produces a large amount of data, therefore I would like to "stream" its output in my Python program using a cycle as I am doing now. How can I evaluate te when extprogram terminates rather than waiting for all the output to be processed?

回答1:

Since you are under Unix, you can use the time command. Here is the principle:

import sys
import subprocess

p = subprocess.Popen(["time", "ls"], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

for line in p.stdout:  # ls output
    sys.stdout.write(line)

time_output = p.stderr.readlines()

print "Stderr:", ''.join(time_output)

On my machine, this gives:

Stderr:         0.01 real         0.00 user         0.00 sys

The total processor time is the user + sys time (real is the wall clock time, which does not generally represent how much processor time the program used: for instance, with sleep 5, the real time is 5 seconds, while the user and sys times are 0).

This works because time outputs a detailed accounting of the real execution time (not simply the wall time, which depends on what other processes are running, etc.), and does so to the standard error output. You can parse the standard error and get the timing information.

This method might not be practical if you program outputs data to the standard error that might interfere with the parsing of the time command.

Also, I haven't checked that no deadlock can happen with the above code (I'm not sure what would happen if the program called prints a lot to the standard error: could the program block until the standard error buffer is read, which may not happen if the Python program is reading the standard output?). That said, if you know that the timed program has no or little data on its standard error, I believe that the code above will not deadlock.

回答2:

The following still uses 'wall clock' time but may be an alternative to the use of host system time commands. The execution and the timing are split into separate threads and the timer can be stopped before any processing is carried out.

from multiprocessing import Event
import threading
import time
import subprocess

def timing(event):
    print "timer starts"
    ts = time.time()
    event.wait()
    te = time.time()
    elapsed_time = te - ts
    print "Elapsed Time " + str(elapsed_time)

def execution(event): 
    for i in range(0,1000):
        p = subprocess.Popen("ls", stdout=subprocess.PIPE)
    event.set()

if __name__ == '__main__':  
    event = Event()
    e = threading.Thread(target=execution, args=(event,))
    t = threading.Thread(target=timing, args=(event,))
    t.start()  
    e.start() 
    while not event.is_set():
        print "running..."
        time.sleep(1)

This gives me the following output:

timer starts
running...
running...
Elapsed Time 1.66236400604

Or you could split receiving the output of 'extprogram' from the processing of the output.

For example:

ts = time.time()
p = subprocess.Popen("extprogram", stdout=subprocess.PIPE)

for line in p.stdout:
    tempdata.append(line)

te = time.time()
elapsed_time = te - ts

for line in tempdata:
    all_processed_data.append(process_output(line))