I need to get a line count of a large file (hundreds of thousands of lines) in python. What is the most efficient way both memory- and time-wise?
At the moment I do:
def file_len(fname):
with open(fname) as f:
for i, l in enumerate(f):
pass
return i + 1
is it possible to do any better?
Just to complete the above methods I tried a variant with the fileinput module:
And passed a 60mil lines file to all the above stated methods:
It's a little surprise to me that fileinput is that bad and scales far worse than all the other methods...
Why not read the first 100 and the last 100 lines and estimate the average line length, then divide the total file size through that numbers? If you don't need a exact value this could work.
If one wants to get the line count cheaply in Python in Linux, I recommend this method:
file_path can be both abstract file path or relative path. Hope this may help.
Kyle's answer
is probably best, an alternative for this is
Here is the comparision of performance of both
Here is what I use, seems pretty clean:
UPDATE: This is marginally faster than using pure python but at the cost of memory usage. Subprocess will fork a new process with the same memory footprint as the parent process while it executes your command.