In Python, what is the difference between f.readli

2019-03-26 04:34发布

问题:

From both Python2 Tutorial and Python3 Tutorial, there is a line in the midpoint of section 7.2.1 saying:

If you want to read all the lines of a file in a list you can also use list(f) or f.readlines().

So my question is: What is the difference between these two ways to turn a file object to a list? I am curious both in performance aspect and in underneath Python object implementation (and maybe the difference between the Python2 and Python3).

回答1:

Functionally, there is no difference; both methods result in the exact same list.

Implementation wise, one uses the file object as an iterator (calls next(f) repeatedly until StopIteration is raised), the other uses a dedicated method to read the whole file.

Python 2 and 3 differ in what that means, exactly, unless you use io.open() in Python 2. Python 2 file objects use a hidden buffer for file iteration, which can trip you up if you mix file object iteration and .readline() or .readlines() calls.

The io library (which handles all file I/O in Python 3) does not use such a hidden buffer, all buffering is instead handled by a BufferedIOBase() wrapper class. In fact, the io.IOBase.readlines() implementation uses the file object as an iterator under the hood anyway, and TextIOWrapper iteration delegates to TextIOWrapper.readline(), so list(f) and f.readlines() essentially are the same thing, really.

Performance wise, there isn't really a difference even in Python 2, as the bottleneck is file I/O; how quickly can you read it from disk. At a micro level, performance can depend on other factors, such as if the OS has already buffered the data and how long the lines are.