From both Python2 Tutorial and Python3 Tutorial, there is a line in the midpoint of section 7.2.1 saying:
If you want to read all the lines of a file in a list you can also use
list(f)
orf.readlines().
So my question is: What is the difference between these two ways to turn a file object to a list? I am curious both in performance aspect and in underneath Python object implementation (and maybe the difference between the Python2 and Python3).
Functionally, there is no difference; both methods result in the exact same list.
Implementation wise, one uses the file object as an iterator (calls
next(f)
repeatedly untilStopIteration
is raised), the other uses a dedicated method to read the whole file.Python 2 and 3 differ in what that means, exactly, unless you use
io.open()
in Python 2. Python 2 file objects use a hidden buffer for file iteration, which can trip you up if you mix file object iteration and.readline()
or.readlines()
calls.The
io
library (which handles all file I/O in Python 3) does not use such a hidden buffer, all buffering is instead handled by aBufferedIOBase()
wrapper class. In fact, theio.IOBase.readlines()
implementation uses the file object as an iterator under the hood anyway, andTextIOWrapper
iteration delegates toTextIOWrapper.readline()
, solist(f)
andf.readlines()
essentially are the same thing, really.Performance wise, there isn't really a difference even in Python 2, as the bottleneck is file I/O; how quickly can you read it from disk. At a micro level, performance can depend on other factors, such as if the OS has already buffered the data and how long the lines are.