Do you know if there's a Python library that generates statistics about code? I'm thinking about pointing to a package and getting number of classes, functions, methods, docblock lines etc.
It could eventually include useless stuff like number of lambdas or other crazy statistics, just for fun.
you can have a look at Pymetrics, or check other tools enumerated there
People don't generally make packages out of things that can be done in a dozen or two lines of code. The following analyzes usage of all python syntax and returns a dictionary mapping ast nodes to how many times that node came up in the source. Examples showing the number of def
and class
statements are below it as well.
import collections
import os
import ast
def analyze(packagedir):
stats = collections.defaultdict(int)
for (dirpath, dirnames, filenames) in os.walk(packagedir):
for filename in filenames:
if not filename.endswith('.py'):
continue
filename = os.path.join(dirpath, filename)
syntax_tree = ast.parse(open(filename).read(), filename)
for node in ast.walk(syntax_tree):
stats[type(node)] += 1
return stats
print("Number of def statements:", analyze('.')[ast.FunctionDef])
print("Number of class statements:", analyze('.')[ast.ClassDef])
Maybe Tahar can help, it displays statistics about how long each function, method, class and module are (in lines of code). However, since it's using the inspect module, it may run in unexpected ways if one of the module it analyzes launches a GUI or something like that.
I'll switch to using AST someday, although I don't know if AST can provide a service that is similar to inspect.getsourcelines() ?
(EDIT)
Mergou (the rewrite of tahar using the tokenize module) is in alpha, here's a video of it in action : http://www.youtube.com/watch?v=PI0iBZmInFU&feature=youtu.be