In Python, is read() , or readlines() faster?

2019-01-07 15:10发布

I want to read a huge file in my code. Is read() or readline() faster for this. How about the loop:

for line in fileHandle

标签: python io
8条回答
不美不萌又怎样
2楼-- · 2019-01-07 15:26

If your file is a text file then use readlines() which is obviously the way to read file containing lines. Apart from that: perform benchmarks if you are really aware of possible performance problems. I doubt that you will encounter any issues....the speed of the filesystem should be the limiting factor.

查看更多
Fickle 薄情
3楼-- · 2019-01-07 15:30

If file is huge, read() is definitevely bad idea, as it loads (without size parameter), whole file into memory.

Readline reads only one line at time, so I would say that is better choice for huge files.

And just iterating over file object should be as effective as using readline.

See http://docs.python.org/tutorial/inputoutput.html#methods-of-file-objects for more info

查看更多
在下西门庆
4楼-- · 2019-01-07 15:31

read() basically is trying to read the whole file and save it into a single string to be used later while readlines() is also trying to read the whole file but it will do a split("\n") and store the strings of lines into a list. Hence, these two methods are not preferred if the file size is excessively big.

readline() and for loop (i.e.for line in file:) will read one line at a time and store it into a string. I guess they will use the same time to finish the job if memory allows. However these two are preferred if the file size is huge.

查看更多
甜甜的少女心
5楼-- · 2019-01-07 15:40

The real difference between read() and readlines() The read function simply loads the file as is into memory. The readlines method reads the file as a list of lines without line termination. The readlines method should only be used on text files, and neither should be used on large files. If copying the information from a text file, read works well, because it can be output with a the write function without the need to add line termination.

查看更多
Anthone
6楼-- · 2019-01-07 15:45

The docs for readlines indicate there is an optional sizehint. Because it is so vague, it's easy to overlook, but I found this to often be the fastest way to read files. Use readlines(1), which hints one line, but in fact reads in about 4k or 8k worth of lines IIRC. This takes advantage of the OS buffering and reduces the number of calls somewhat without using an excessive amount of memory.

You can experiment with different sizes of the sizehint, but I found 1 to be optimal on my platform when I was testing this

查看更多
相关推荐>>
7楼-- · 2019-01-07 15:46

For a text file just iterating over it with a for loop is almost always the way to go. Never mind about speed, it is the cleanest.

In some versions of python readline() really does just read a single line while the for loop reads large chunks and splits them up into lines so it may be faster. I think that more recent versions of Python use buffering also for readline() so the performance difference will be minuscule (for is probably still microscopically faster because it avoids a method call). However choosing one over the other for performance reasons is probably premature optimisation.

Edit to add: I just checked back through some Python release notes. Python 2.5 said:

It’s now illegal to mix iterating over a file with for line in file and calling the file object’s read()/readline()/readlines() methods.

Python 2.6 introduced TextIOBase which supports both iterating and readline() simultaneously.

Python 2.7 fixed interleaving read() and readline().

查看更多
登录 后发表回答