I'm currently reviewing a very old C++ project and see lots of code duplication there.
For example, there is a class with 5 MFC message handlers each holding 10 identical lines of code. Or there is a 5-line snippet for a very specific string transformation every here and there. Reducing code duplication is not a problem in these cases at all.
But I have a strange feeling that I might be misunderstanding something and that there was originally a reason for this duplication.
What could be a valid reason for duplicating code?
You might want to do so to make sure that future changes in one part will not unintentionally change the other part. for example consider
Now you can prevent "code duplication" with function like this:
However there is a risk that some other programmer will want to change Do_A_Policy() and will do so by changing first_policy() and will cause the side effect of changing Do_B_Policy(), a side effect which the programmer may not be aware of. so this kind of "code duplication" can serve as a safety mechanism against this kind of future changes in the program.
For that kind of code duplication (lots of lines duplicated lots of times), I'd say :
Probably the first solution, though, from what I've generally seen :-(
Best solution I've seen against that : have your developpers start by maintaining some old application, when they are hired -- that'll teach them that this kind of thing is not good... And they will understand why, which is the most important part.
Splitting code into several functions, re-using code the right way, and all that often come with experience -- or you have not hired the right people ;-)
On large projects ( those with a code-base as large as a GB ) it's quite possible to lose existing API. This is typically due to insufficient documentation, or an inability of the programmer to locate the original code; hence duplicate code.
Boils down to laziness, or poor review practice.
EDIT:
One additional possibility is that there may have been additional code in those methods which was removed along the way.
Have you looked at the revision history on the file?
Besides being inexperienced, there is why duplicated code occurrences might show up:
No time to properly refactor
Most of us are working in a real world where real constraints force us to move quickly to real problems instead of thinking about niceness of the code. So we copy&paste and move on. With me, if I later see that code is duplicated several more times, it is the sign that I have to spend some more time on it and converge all instances to one.
Generalization of the code not possible/not 'pretty' due to language constraints
Lets say that deep inside a function you have several statements that greatly differ from instance to instance of same duplicated code. For example: I have a function that draws 2d array of thumbnails for the video, and it's embedded with calculation of each thumbnail position. In order to calculate hit-test (calculate thumbnail index from click position) I am using same code but without painting.
You are not sure that there will be generalization at all
Duplicate code at first, and later observe how it will evolve. Since we are writing software, we can allow 'as late as possible' modifications to the software, since everything is 'soft' and changeable.
I'll add more if I remember something else.
Added later...
Loop unrolling
In time before compilers were smart as Einstein and Hawking combined, you had to unroll the loops or inline code to be faster. Loop unrolling will make your code to be duplicated, and probably faster by few percents, it compiler didn't do it for you anyway.
Laziness, that's the only reason I can think of.
On a more serious note. The only valid reason I can think of is changes at the very end of the product cycle. These tend to undergo a lot more scrutiny and the smallest change tends to have the highest rate of success. In that limited circumstance it is easier to get a code duplication change through as opposed to refactoring out a smaller change.
Still leaves a bad taste in my mouth.
Sometimes methods and classes which domain-wise have nothing in common, but implementation-wise looks a lot alike. In these cases it's often better to do code duplication as future changes more often that not will branch these implementations into something that aren't the same.