counting non-empty lines and sum of lengths of tho

2019-07-24 10:25发布

问题:

Am trying to create a function that takes a filename and it returns a 2-tuple with the number of the non-empty lines in that program, and the sum of the lengths of all those lines. Here is my current program:

def code_metric(file):
    with open(file, 'r') as f: 
        lines = len(list(filter(lambda x: x.strip(), f)))
        num_chars = sum(map(lambda l: len(re.sub('\s', '', l)), f))

    return(lines, num_chars)

The result I get is get if I do:

if __name__=="__main__":
print(code_metric('cmtest.py'))

is

(3, 0)

when it should be:

(3,85)

Also is there a better way of finding the sum of the length of lines using using the functionals map, filter, and reduce? I did it for the first part but couldn't figure out the second half. AM kinda new to python so any help would be great.

Here is the test file called cmtest.py:

import prompt,math

x = prompt.for_int('Enter x')
print(x,'!=',math.factorial(x),sep='')

First line has 18 characters (including white space)
Second line has 29 characters
Third line has 38 characters

[(1, 18), (1, 29), (1, 38)]

The line count is 85 characters including white spaces. I apologize, I mis-read the problem. The length total for each line should include the whitespaces as well.

回答1:

A fairly simple approach is to build a generator to strip trailing whitespace, then enumerate over that (with a start value of 1) filtering out blank lines, and summing the length of each line in turn, eg:

def code_metric(filename):
    line_count = char_count = 0
    with open(filename) as fin:
        stripped = (line.rstrip() for line in fin)
        for line_count, line in enumerate(filter(None, stripped), 1):
            char_count += len(line)
    return line_count, char_count

print(code_metric('cmtest.py'))
# (3, 85)


回答2:

In order to count lines, maybe this code is cleaner:

with open(file) as f:
    lines = len(file.readlines())

For the second part of your program, if you intend to count only non-empty characters, then you forgot to remove '\t' and '\n'. If that's the case

with open(file) as f:
    num_chars = len(re.sub('\s', '', f.read()))

Some people have advised you to do both things in one loop. That is fine, but if you keep them separated you can make them into different functions and have more reusability of them that way. Unless you are handling huge files (or executing this coded millions of times), it shouldn't matter in terms of performance.