How to get line count cheaply in Python?

2018-12-31 03:20发布

I need to get a line count of a large file (hundreds of thousands of lines) in python. What is the most efficient way both memory- and time-wise?

At the moment I do:

def file_len(fname):
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

is it possible to do any better?

30条回答
若你有天会懂
2楼-- · 2018-12-31 03:36

count = max(enumerate(open(filename)))[0]

查看更多
回忆,回不去的记忆
3楼-- · 2018-12-31 03:37
print open('file.txt', 'r').read().count("\n") + 1
查看更多
刘海飞了
4楼-- · 2018-12-31 03:37

How about this one-liner:

file_length = len(open('myfile.txt','r').read().split('\n'))

Takes 0.003 sec using this method to time it on a 3900 line file

def c():
  import time
  s = time.time()
  file_length = len(open('myfile.txt','r').read().split('\n'))
  print time.time() - s
查看更多
零度萤火
5楼-- · 2018-12-31 03:38

This code is shorter and clearer. It's probably the best way:

num_lines = open('yourfile.ext').read().count('\n')
查看更多
伤终究还是伤i
6楼-- · 2018-12-31 03:40

This is the fastest thing I have found using pure python. You can use whatever amount of memory you want by setting buffer, though 2**16 appears to be a sweet spot on my computer.

from functools import partial

buffer=2**16
with open(myfile) as f:
        print sum(x.count('\n') for x in iter(partial(f.read,buffer), ''))

I found the answer here Why is reading lines from stdin much slower in C++ than Python? and tweaked it just a tiny bit. Its a very good read to understand how to count lines quickly, though wc -l is still about 75% faster than anything else.

查看更多
人间绝色
7楼-- · 2018-12-31 03:40

As for me this variant will be the fastest:

#!/usr/bin/env python

def main():
    f = open('filename')                  
    lines = 0
    buf_size = 1024 * 1024
    read_f = f.read # loop optimization

    buf = read_f(buf_size)
    while buf:
        lines += buf.count('\n')
        buf = read_f(buf_size)

    print lines

if __name__ == '__main__':
    main()

reasons: buffering faster than reading line by line and string.count is also very fast

查看更多
登录 后发表回答