-->

Why does python keep buffering stdout even when fl

2020-07-23 05:22发布

问题:

$ cat script.py
import sys

for line in sys.stdin:
    sys.stdout.write(line)
    sys.stdout.flush()

$ cat script.py - | python -u script.py

The output is right but it only starts printing once I hit Ctrl-D whereas the following starts printing right away :

$ cat script.py - | cat

which led me to think that the buffering does not come from cat.

I managed to get it working by doing :

for line in iter(sys.stdin.readline, ""):

as explained here : Streaming pipes in Python, but I don't understand why the former solution doesn't work as expected.

回答1:

Python manpage reveals the answer to your question:

   -u     Force stdin, stdout and stderr to be totally unbuffered.  On systems where it matters, also put stdin, stdout and stderr in binary mode.  Note that
          there  is  internal  buffering  in  xreadlines(),  readlines()  and file-object iterators ("for line in sys.stdin") which is not influenced by this
          option.  To work around this, you will want to use "sys.stdin.readline()" inside a "while 1:" loop.

That is: file-object iterators' internal buffering is to blame (and it doesn't go away with -u).



回答2:

cat does block buffering by default if output is to a pipe. So when you include - (stdin) in the cat command, it waits to get EOF (your ctrl-D closes the stdin stream) or 8K (probably) of data before outputting anything.

If you change the cat command to "cat script.py |" you'll see that it works as you expected.

Also, if you add 8K of comments to the end of script.py, it will immediately print it as well.

Edit:

The above is wrong. :-)

It turns out that file.next() (used by file iterators, ie. for line in file) has a hidden read-ahead buffer that is not used by readline(), which simply reads a character until it sees a newline or EOF.