I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines()
because it will create a very large list in the memory.
How will the code below work for this case? Is xreadlines
itself reading one by one into memory? Is the generator expression needed?
f = (line for line in open("log.txt").xreadlines()) # how much is loaded in memory?
f.next()
Plus, what can I do to read this in reverse order, just as the Linux tail
command?
I found:
http://code.google.com/p/pytailer/
and
"python head, tail and backward read by lines of a text file"
Both worked very well!
I demonstrated a parallel byte level random access approach here in this other question:
Getting number of lines in a text file without readlines
Some of the answers already provided are nice and concise. I like some of them. But it really depends what you want to do with the data that's in the file. In my case I just wanted to count lines, as fast as possible on big text files. My code can be modified to do other things too of course, like any code.