I am in a project where previous programmers have been copy-pasting codes all over the place. These codes are actually identical (or very similar) and they could have been refactored into one.
I have spent countless hours refactoring these codes manually but I think there must be a better way. Some are very trivial static methods that could have been moved into an ancestor class (but instead was copy pasted all over by previous junior programmers).
Is there a code analysis tool that can detect this and provide reports/recommendations? I prefer free/open source tool if possible.
I use the following tools:
- PMD/CPD (BSD-style License).
Checkstyle (LGPL License) - support was removed, see details.
Both tools have code duplication detection support. But both of them lack the ability to advise you how to refactor your code.
JetBrains IntelliJ IDEA Ultimate has good static code analysis with code duplication support, but it is not free.
Most of the tools listed on the Wikipedia article on Duplicate Code Tools will detect duplicates in many different languages, including Java.
SonarQube can detect duplicated codes but does not give recommendation on eliminating them. It is free and - although with the default setup it can only detect lexically identical clones - there is a free CodeAnalyzer for SonarQube plugin with which you can detect more sophisticated structural clones instead of lexical ones.
Either Simian or PMD's CPD. The former supports a wider set of languages but is non free for commercial projects.
http://checkstyle.sourceforge.net/ has support for finding duplicates
See our SD Java CloneDR, a tool for detecting exact and near-miss duplicate code in large Java systems.
The CloneDR will find code clones in spite of whitespace changes, line breaks, comment insertions deletions, modification of constants or identifiers, and in a number of cases, even replacement of one statement by another or a block of statements.
It shows where each set of clones is found, each individual clone, an abstraction of the clones having their shared commonality and parameterization of the abstraction to show how each clone instance can be derived from the abstraction.
It finds 10-20% clones in most Java systems.