I want to read a huge file in my code. Is read() or readline() faster for this. How about the loop:
for line in fileHandle
I want to read a huge file in my code. Is read() or readline() faster for this. How about the loop:
for line in fileHandle
If your file is a text file then use readlines() which is obviously the way to read file containing lines. Apart from that: perform benchmarks if you are really aware of possible performance problems. I doubt that you will encounter any issues....the speed of the filesystem should be the limiting factor.
If file is huge, read() is definitevely bad idea, as it loads (without size parameter), whole file into memory.
Readline reads only one line at time, so I would say that is better choice for huge files.
And just iterating over file object should be as effective as using readline.
See http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects for more info
read() basically is trying to read the whole file and save it into a single string to be used later while readlines() is also trying to read the whole file but it will do a split("\n") and store the strings of lines into a list. Hence, these two methods are not preferred if the file size is excessively big.
readline() and for loop (i.e.for line in file:) will read one line at a time and store it into a string. I guess they will use the same time to finish the job if memory allows. However these two are preferred if the file size is huge.
The real difference between read() and readlines() The read function simply loads the file as is into memory. The readlines method reads the file as a list of lines without line termination. The readlines method should only be used on text files, and neither should be used on large files. If copying the information from a text file, read works well, because it can be output with a the write function without the need to add line termination.
The docs for readlines indicate there is an optional sizehint. Because it is so vague, it's easy to overlook, but I found this to often be the fastest way to read files. Use readlines(1), which hints one line, but in fact reads in about 4k or 8k worth of lines IIRC. This takes advantage of the OS buffering and reduces the number of calls somewhat without using an excessive amount of memory.
You can experiment with different sizes of the sizehint, but I found 1 to be optimal on my platform when I was testing this
For a text file just iterating over it with a
for
loop is almost always the way to go. Never mind about speed, it is the cleanest.In some versions of python
readline()
really does just read a single line while thefor
loop reads large chunks and splits them up into lines so it may be faster. I think that more recent versions of Python use buffering also forreadline()
so the performance difference will be minuscule (for
is probably still microscopically faster because it avoids a method call). However choosing one over the other for performance reasons is probably premature optimisation.Edit to add: I just checked back through some Python release notes. Python 2.5 said:
Python 2.6 introduced TextIOBase which supports both iterating and
readline()
simultaneously.Python 2.7 fixed interleaving
read()
andreadline()
.