Piped Python script takes 100% of CPU when reading

2019-07-05 02:05发布

问题:

I have two Python scripts running on an Ubuntu Linux machine. The 1st one sends all its output into stdout, the second one reads from stdin. They are connected by a simple pipe, i.e. something like this:

./step1.py <some_args> | ./step2.py <some_other_args>

What step2 does is that it reads lines of input in an infinite loop and processes them:

while True:
    try:
        l = sys.stdin.readline()
        # processing here

Step1 crashes from time to time. When that happens (not sure if always but at least on several occasions) is that instead of crashing/stopping, step2 goes crazy and starts taking 100% of the CPU until I manually kill it.

Why is this happening and how can I make step2 more robust so that it stops when the pipe is broken?

Thanks!

回答1:

Others already explained why you end up in an endless loop in certain cases.

In the second (reading) script, you can use the idiom:

for line in sys.stdin:
    process(line)

This way you will not end up in an endless loop. Furthermore, you did not actually show which exception you try to catch in the second script, but I guess that from time to time you'll experience a 'broken pipe' error, which you can and should catch as described here: How to handle a broken pipe (SIGPIPE) in python?

The whole scheme then could look like this:

try:
    for line in sys.stdin:
        process(line)
except IOError, e:
    if e.errno == errno.EPIPE:
        # EPIPE error
    else:
        # Other error


回答2:

When step1 dies, you have a while loop with a try on a statement that will throw an exception. Thus you'll continuously try and fail using 100% of the CPU as readline won't block when it's throwing an exception.

Either add a time delay to reading with time.sleep or, even better, pay attention to the errors readline is throwing and catch the specific error that is thrown when step1 stops and quit the program instead of trying to read from a dead pipe.

You probably want a sleep operator when the pipe is empty and an exit when the pipe dies, but which exception is thrown with what message in each case I leave as an exercise for you to determine. The sleep operator isn't necessary in such a situation but it will avoid other situations where you can hit high CPU usage on useless work.