I need to read a large file, line by line. Lets say that file has more than 5GB and I need to read each line, but obviously I do not want to use readlines()
because it will create a very large list in the memory.
How will the code below work for this case? Is xreadlines
itself reading one by one into memory? Is the generator expression needed?
f = (line for line in open("log.txt").xreadlines()) # how much is loaded in memory?
f.next()
Plus, what can I do to read this in reverse order, just as the Linux tail
command?
I found:
http://code.google.com/p/pytailer/
and
"python head, tail and backward read by lines of a text file"
Both worked very well!
hope this helps.
An old school approach:
Thank you! I have recently converted to python 3 and have been frustrated by using readlines(0) to read large files. This solved the problem. But to get each line, I had to do a couple extra steps. Each line was preceded by a "b'" which I guess that it was in binary format. Using "decode(utf-8)" changed it ascii.
Then I had to remove a "=\n" in the middle of each line.
Then I split the lines at the new line.
Here is the code starting just above "print data" in Arohi's code.
Please try this:
You are better off using an iterator instead. Relevant: http://docs.python.org/library/fileinput.html
From the docs:
This will avoid copying the whole file into memory at once.
How about this? Divide your file into chunks and then read it line by line, because when you read a file, your operating system will cache the next line. If you are reading the file line by line, you are not making efficient use of the cached information.
Instead, divide the file into chunks and load the whole chunk into memory and then do your processing.