I need to get a line count of a large file (hundreds of thousands of lines) in python. What is the most efficient way both memory- and time-wise?
At the moment I do:
def file_len(fname):
with open(fname) as f:
for i, l in enumerate(f):
pass
return i + 1
is it possible to do any better?
count = max(enumerate(open(filename)))[0]
How about this one-liner:
Takes 0.003 sec using this method to time it on a 3900 line file
This code is shorter and clearer. It's probably the best way:
This is the fastest thing I have found using pure python. You can use whatever amount of memory you want by setting buffer, though 2**16 appears to be a sweet spot on my computer.
I found the answer here Why is reading lines from stdin much slower in C++ than Python? and tweaked it just a tiny bit. Its a very good read to understand how to count lines quickly, though
wc -l
is still about 75% faster than anything else.As for me this variant will be the fastest:
reasons: buffering faster than reading line by line and
string.count
is also very fast