I've seen How can you find unused functions in Python code? but that's really old, and doesn't really answer my question.
I have a large python project with multiple libraries that are shared by multiple entry point scripts. This project has been accreting for many years with many authors, so there's a whole lot of dead code. You know the drill.
I know that finding all dead code is un-decidable. All I need is a tool that will find all functions that are not called anywhere. We're not doing anything fancy with calling functions based on the string of the function name, so I'm not worried about anything pathological...
I just installed pylint, but it appears to be file based, and not paying much attention to interfile dependencies, or even function dependencies.
Clearly, I could grep for def in all of the files, get all of the function names from that, and do a grep for each of those function names. I'm just hoping that there's something a little smarter than that out there already.
ETA: Please note that I don't expect or want something perfect. I know my halting-problem-proof just as well anyone (No really I taught theory of computation I know when I'm looking at something that is recursively enumerable). Any thing that tries to approximate it by actually running the code is going to take way too long. I just want something that syntactically goes through the code and says "This function is definitely used. This function MIGHT be used, and this function is definitely NOT used, no one else even seems to know it exists!" And the first two categories aren't important.
If you have your code covered with a lot of tests (it is quite useful at all), run them with code-coverage plugin and you can see unused code then .)
With the following line you can list all function definitions that are obviously not used as an attribute, a function call, a decorator or a return value. So it is approximately what you are looking for. It is not perfect, it is slow, but I never got any false positives. (With linux you have to replace
ack
withack-grep
)Try running Ned Batchelder's coverage.py.
It is very hard to determine which functions and methods are called without executing the code, even if the code doesn't do any fancy stuff. Plain function invocations are rather easy to detect, but method calls are really hard. Just a simple example:
Nothing fancy going on here, but any script that tries to determine whether
A.f()
orB.f()
is called will have a rather hard time to do so without actually executing the code.While the above code doesn't do anything useful, it certainly uses patterns that appear in real code -- namely putting instances in containers. Real code will usually do even more complex things -- pickling and unpickling, hierarchical data structures, conditionals.
As stated before, just detecting plain function invocations of the form
or
will be rather easy. You can use the
ast
module to parse your source files. You will need to record all imports, and the names used to import other modules. You will also need to track top-level function definitions and the calls inside these functions. This will give you a dependency graph, and you can use NetworkX to detect the connected components of this graph.While this might sound rather complex, it can probably done with less than 100 lines of code. Unfortunately, almost all major Python projects use classes and methods, so it will be of little help.
IMO that could be achieved pretty quickly with a simple pylint plugin that :
Then you would have to call pylint on all your code base to get something that make sense. Of course as said this would need to checked, as there may have been inference failures or such that would introduce false positive. Anyway that would probably greatly reduce the number of grep to be done.
I've not much time to do it myself yet but anyone would find help on the python-projects@logilab.org mailing list.
Here's the solution I'm using at least tentatively:
Then I look at the individual functions that have very few references (< 3 say)
it's ugly, and it only gives me approximate answers, but I think it's good enough for a start. What are you-all's thoughts?