I need to get a line count of a large file (hundreds of thousands of lines) in python. What is the most efficient way both memory- and time-wise?
At the moment I do:
def file_len(fname):
with open(fname) as f:
for i, l in enumerate(f):
pass
return i + 1
is it possible to do any better?
One line, probably pretty fast:
A one-line bash solution similar to this answer, using the modern
subprocess.check_output
function:Similarly:
I believe that a memory mapped file will be the fastest solution. I tried four functions: the function posted by the OP (
opcount
); a simple iteration over the lines in the file (simplecount
); readline with a memory-mapped filed (mmap) (mapcount
); and the buffer read solution offered by Mykola Kharechko (bufcount
).I ran each function five times, and calculated the average run-time for a 1.2 million-line text file.
Windows XP, Python 2.5, 2GB RAM, 2 GHz AMD processor
Here are my results:
Edit: numbers for Python 2.6:
So the buffer read strategy seems to be the fastest for Windows/Python 2.6
Here is the code:
What about this
How about this?