How to get line count cheaply in Python?-第5页回答

I need to get a line count of a large file (hundreds of thousands of lines) in python. What is the most efficient way both memory- and time-wise?

At the moment I do:

def file_len(fname):
    with open(fname) as f:
        for i, l in enumerate(f):
            pass
    return i + 1

is it possible to do any better?

标签： python text-files line-count

30条回答

弹指情弦暗扣

2楼-- · 2018-12-31 03:56

Just to complete the above methods I tried a variant with the fileinput module:

import fileinput as fi   
def filecount(fname):
        for line in fi.input(fname):
            pass
        return fi.lineno()

And passed a 60mil lines file to all the above stated methods:

mapcount : 6.1331050396
simplecount : 4.588793993
opcount : 4.42918205261
filecount : 43.2780818939
bufcount : 0.170812129974

It's a little surprise to me that fileinput is that bad and scales far worse than all the other methods...

0人赞添加讨论(0) 举报

深知你不懂我心

3楼-- · 2018-12-31 03:59

Why not read the first 100 and the last 100 lines and estimate the average line length, then divide the total file size through that numbers? If you don't need a exact value this could work.

0人赞添加讨论(0) 举报

刘海飞了

4楼-- · 2018-12-31 04:00

If one wants to get the line count cheaply in Python in Linux, I recommend this method:

import os
print os.popen("wc -l file_path").readline().split()[0]

file_path can be both abstract file path or relative path. Hope this may help.

0人赞添加讨论(0) 举报

一个人的天荒地老

5楼-- · 2018-12-31 04:00

def line_count(path):
    count = 0
    with open(path) as lines:
        for count, l in enumerate(lines, start=1):
            pass
    return count

0人赞添加讨论(0) 举报

明月照影归

6楼-- · 2018-12-31 04:02

Kyle's answer

num_lines = sum(1 for line in open('my_file.txt'))

is probably best, an alternative for this is

num_lines =  len(open('my_file.txt').read().splitlines())

Here is the comparision of performance of both

In [20]: timeit sum(1 for line in open('Charts.ipynb'))
100000 loops, best of 3: 9.79 µs per loop

In [21]: timeit len(open('Charts.ipynb').read().splitlines())
100000 loops, best of 3: 12 µs per loop

0人赞添加讨论(0) 举报

浮光初槿花落

7楼-- · 2018-12-31 04:02

Here is what I use, seems pretty clean:

import subprocess

def count_file_lines(file_path):
    """
    Counts the number of lines in a file using wc utility.
    :param file_path: path to file
    :return: int, no of lines
    """
    num = subprocess.check_output(['wc', '-l', file_path])
    num = num.split(' ')
    return int(num[0])

UPDATE: This is marginally faster than using pure python but at the cost of memory usage. Subprocess will fork a new process with the same memory footprint as the parent process while it executes your command.

0人赞添加讨论(0) 举报

上一页 1 2 3 4 5

How to get line count cheaply in Python?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间