newbie python subprocess: “write error: Broken pip

2019-01-17 08:58发布

问题:

Thanks to the helpful suggestions below:

So it seems to be fixed when I

  1. separate commands into individual calls to Popen
  2. stderr=subprocess.PIPE as an argument to each Popen chain.

The New code:

import subprocess
import shlex
import logging

def run_shell_commands(cmds):
    """ Run commands and return output from last call to subprocess.Popen.
        For usage see the test below.
    """
    # split the commands
    cmds = cmds.split("|")
    cmds = list(map(shlex.split,cmds))

    logging.info('%s' % (cmds,))

    # run the commands
    stdout_old = None
    stderr_old = None
    p = []
    for cmd in cmds:
        logging.info('%s' % (cmd,))
        p.append(subprocess.Popen(cmd,stdin=stdout_old,stdout=subprocess.PIPE,stderr=subprocess.PIPE))
        stdout_old = p[-1].stdout
        stderr_old = p[-1].stderr
    return p[-1]


pattern = '"^85567      "'
file = "j"

cmd1 = 'grep %s %s | sort -g -k3 | head -10 | cut -d" " -f2,3' % (pattern, file)
p = run_shell_commands(cmd1)
out = p.communicate()
print(out)

Original Post:

I've spent too long trying to solve a problem piping a simple subprocess.Popen.

Code:

import subprocess
cmd = 'cat file | sort -g -k3 | head -20 | cut -f2,3' % (pattern,file)
p = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE)
for line in p.stdout:
    print(line.decode().strip())

Output for file ~1000 lines in length:

...
sort: write failed: standard output: Broken pipe
sort: write error

Output for file >241 lines in length:

...
sort: fflush failed: standard output: Broken pipe
sort: write error

Output for file <241 lines in length is fine.

I have been reading the docs and googling like mad but there is something fundamental about the subprocess module that I'm missing ... maybe to do with buffers. I've tried p.stdout.flush() and playing with the buffer size and p.wait(). I've tried to reproduce this with commands like 'sleep 20; cat moderatefile' but this seems to run without error.

回答1:

From the recipes on subprocess docs:

# To replace shell pipeline like output=`dmesg | grep hda`
p1 = Popen(["dmesg"], stdout=PIPE)
p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE)
output = p2.communicate()[0]


回答2:

This is because you shouldn't use "shell pipes" in the command passed to subprocess.Popen, you should use the subprocess.PIPE like this:

from subprocess import Popen, PIPE

p1 = Popen('cat file', stdout=PIPE)
p2 = Popen('sort -g -k 3', stdin=p1.stdout, stdout=PIPE)
p3 = Popen('head -20', stdin=p2.stdout, stdout=PIPE)
p4 = Popen('cut -f2,3', stdin=p3.stdout)
final_output = p4.stdout.read()

But i have to say that what you're trying to do could be done in pure python instead of calling a bunch of shell commands.



回答3:

I have been having the same error. Even put the pipe in a bash script and executed that instead of the pipe in Python. From Python it would get the broken pipe error, from bash it wouldn't.

It seems to me that perhaps the last command prior to the head is throwing an error as it's (the sort) STDOUT is closed. Python must be picking up on this whereas with the shell the error is silent. I've changed my code to consume the entire input and the error went away.

Would make sense also with smaller files working as the pipe probably buffers the entire output before head exits. This would explain the breaks on larger files.

e.g., instead of a 'head -1' (in my case, I was only wanting the first line), I did an awk 'NR == 1'

There are probably better ways of doing this depending on where the 'head -X' occurs in the pipe.



回答4:

You don't need shell=True. Don't invoke the shell. This is how I would do it:

p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
stdout_value = p.communicate()[0] 
stdout_value   # the output

See if you face the problem about the buffer after using this?



回答5:

try using communicate(), rather than reading directly from stdout.

the python docs say this:

"Warning Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process."

http://docs.python.org/library/subprocess.html#subprocess.Popen.stdout

p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
output =  p.communicate[0]
for line in output:
    # do stuff