Duplicate code can be hard to find, especially in a large project. But PMD's Copy/Paste Detector (CPD) can find it for you! CPD has been through three major incarnations:
First we wrote it using a variant of Michael Wise's Greedy String Tiling algorithm (our variant is described here)
Then it was completely rewritten by Brian Ewins using the Burrows-Wheeler transform
Finally, it was rewritten by Steve Hawkins to use the Karp-Rabin string matching algorithm.
...
Note that CPD works with Java, JSP, C, C++, Fortran and PHP code.
Be aware that you can't just compare lines of text. You will have to parse the code, in this manner, you could also detect segments that are semantically correct but may have different named identifiers.
For example, given two functions that are equivalent but use different identifiers, a text search will not see them as identical, but a parser can.
Also note that writing a C++ parser is not a trivial task, even when given the grammar. I suggest the advice of others and seek out a tool for this. Also search for refactoring tools.
Simian (noted earlier) is a good tool for this. I have been using CloneDetective on my project and it works great. CloneDetective is free, so it can't hurt to give it a try.
See CloneDR, a tool for finding exact copy and near-miss (copy-paste-edit) clones in source code. It uses full language parsers to enable it to find clones according to the language structure, minimizing false positives, and to be completely indendent of how the code is commented or formatted, thereby maximing true detection. The CloneDR will find clones when the cloned block has changed variable, inserted statemens or blocks of code.
It has language front ends for C, C++, COBOL, C#, Java, PHP and a number of other langauges.
You can see sample clone detection reports at the website.
A subset of your problem: Detecting duplicate code:
Try: PMD
https://github.com/hudayou/fib
Tool to find identical code blocks in a file or directory.
Be aware that you can't just compare lines of text. You will have to parse the code, in this manner, you could also detect segments that are semantically correct but may have different named identifiers.
For example, given two functions that are equivalent but use different identifiers, a text search will not see them as identical, but a parser can.
Also note that writing a C++ parser is not a trivial task, even when given the grammar. I suggest the advice of others and seek out a tool for this. Also search for refactoring tools.
Simian (noted earlier) is a good tool for this. I have been using CloneDetective on my project and it works great. CloneDetective is free, so it can't hurt to give it a try.
You'll want to take a look at Simian. It's free for noncommercial projects. Try something like:
See CloneDR, a tool for finding exact copy and near-miss (copy-paste-edit) clones in source code. It uses full language parsers to enable it to find clones according to the language structure, minimizing false positives, and to be completely indendent of how the code is commented or formatted, thereby maximing true detection. The CloneDR will find clones when the cloned block has changed variable, inserted statemens or blocks of code.
It has language front ends for C, C++, COBOL, C#, Java, PHP and a number of other langauges.
You can see sample clone detection reports at the website.