Any valid reason for code duplication? [closed]

2019-02-16 05:27发布

问题:

I'm currently reviewing a very old C++ project and see lots of code duplication there.

For example, there is a class with 5 MFC message handlers each holding 10 identical lines of code. Or there is a 5-line snippet for a very specific string transformation every here and there. Reducing code duplication is not a problem in these cases at all.

But I have a strange feeling that I might be misunderstanding something and that there was originally a reason for this duplication.

What could be a valid reason for duplicating code?

回答1:

When I first started programming, I wrote an app where I had a bunch of similar functionality which I wrapped up in a neat little 20-30 line function ... I was very proud of myself for writing such an elegant piece of code.

Shortly after, the client changed the process in very specific cases, then again, then again, then again , and again, and again .... (many many more times) My elegant code turned into a very difficult, hackish, buggy, & high maintenance mess.

A year later, when I was asked to do something very similar, I deliberately decided to ignore DRY. I put together the basic process, and generated all duplicate code. The duplicate code was documented and I saved the template used to generate the code. When the client asked for specific conditional change (like, if x == y^z + b then 1+2 == 3.42) it was a piece of cake. It was unbelievably easy to maintain & change.

In retrospect, I probably could have solved many of these problems with function pointers and predicates, but using the knowledge I had at the time, I still believe in this specific case, this was the best decision.



回答2:

A good read about this is large scale c++ software design by John Lakos.

He has many good points about code duplication, where it might help or hinder a project.

The most important point is asking when deciding to remove duplication or duplicate code:

If this method changes in the future, do I want to change the behaviour in the duplicated method, or needs it to stay the way it is?

After all, methods contain (business) logic, and sometimes you'll want to change the logic for every caller, sometimes not. Depends on the circumstances.

In the end, it's all about maintenance, not about pretty source.



回答3:

Laziness, that's the only reason I can think of.

On a more serious note. The only valid reason I can think of is changes at the very end of the product cycle. These tend to undergo a lot more scrutiny and the smallest change tends to have the highest rate of success. In that limited circumstance it is easier to get a code duplication change through as opposed to refactoring out a smaller change.

Still leaves a bad taste in my mouth.



回答4:

Besides being inexperienced, there is why duplicated code occurrences might show up:

No time to properly refactor

Most of us are working in a real world where real constraints force us to move quickly to real problems instead of thinking about niceness of the code. So we copy&paste and move on. With me, if I later see that code is duplicated several more times, it is the sign that I have to spend some more time on it and converge all instances to one.

Generalization of the code not possible/not 'pretty' due to language constraints

Lets say that deep inside a function you have several statements that greatly differ from instance to instance of same duplicated code. For example: I have a function that draws 2d array of thumbnails for the video, and it's embedded with calculation of each thumbnail position. In order to calculate hit-test (calculate thumbnail index from click position) I am using same code but without painting.

You are not sure that there will be generalization at all

Duplicate code at first, and later observe how it will evolve. Since we are writing software, we can allow 'as late as possible' modifications to the software, since everything is 'soft' and changeable.

I'll add more if I remember something else.


Added later...

Loop unrolling

In time before compilers were smart as Einstein and Hawking combined, you had to unroll the loops or inline code to be faster. Loop unrolling will make your code to be duplicated, and probably faster by few percents, it compiler didn't do it for you anyway.



回答5:

You might want to do so to make sure that future changes in one part will not unintentionally change the other part. for example consider

Do_A_Policy()
{
  printf("%d",1);
  printf("%d",2);
}

Do_B_Policy()
{
  printf("%d",1);
  printf("%d",2);
}

Now you can prevent "code duplication" with function like this:

first_policy()
{
printf("%d",1);
printf("%d",2);
}

Do_A_Policy()
{
first_policy()
}

Do_B_Policy()
{
first_policy()
}

However there is a risk that some other programmer will want to change Do_A_Policy() and will do so by changing first_policy() and will cause the side effect of changing Do_B_Policy(), a side effect which the programmer may not be aware of. so this kind of "code duplication" can serve as a safety mechanism against this kind of future changes in the program.



回答6:

Sometimes methods and classes which domain-wise have nothing in common, but implementation-wise looks a lot alike. In these cases it's often better to do code duplication as future changes more often that not will branch these implementations into something that aren't the same.



回答7:

The valid reason I can think of: If the code gets alot more complex to avoid the duplication. Basically that's the place when you do almost the same in several methods - but just not quite the same. Of course - you can then refactor and add special parameters including pointers to different members that have to be modified. But the new, refactored method may get too complicated.

Example (pseudocode):

procedure setPropertyStart(adress, mode, value)
begin
  d:=getObject(adress)
  case mode do
  begin
    single: 
       d.setStart(SingleMode, value);
    delta:
       //do some calculations
       d.setStart(DeltaSingle, calculatedValue);
   ...
end;

procedure setPropertyStop(adress, mode, value)
begin
  d:=getObject(adress)
  case mode do
  begin
    single: 
       d.setStop(SingleMode, value);
    delta:
       //do some calculations
       d.setStop(DeltaSingle, calculatedValue);
   ...
end;

You could refactor out the method call (setXXX) somehow - but depending on the language it could be difficult (especially with inheritance). It is code duplication since most of the body is the same for each property, but it can be hard to refactor out the common parts.

In short - if the refactored method is factors more complicated, I'd go with code duplication although it is "evil" (and will stay evil).



回答8:

The only "valid" thing I can see this arising from is when those lines of code were different, then converged to the same thing through subsequent edits. I've had this happen to me before, but none too frequently.

This is, of course, when it's time to factor out this common segment of code into new functionality.

That said, I can't think of any reasonable way to justify duplicate code. Look at why it's bad.

It's bad because a change in one place requires a change in multiple places. This is increased time, with a chance of bugs. By factoring it out, you maintain the code in a single, working location. After all, when you write a program you don't write it twice, why would a function be any different?



回答9:

For that kind of code duplication (lots of lines duplicated lots of times), I'd say :

  • either laziness (you just paste some code here and there, without having to worry about any impact it could have on other parts of the application -- while writing a new function and using it in two places could, I suppose, have some impact)
  • or not knowing any good practice (re-using code, separating different tasks in different functions/methods)

Probably the first solution, though, from what I've generally seen :-(

Best solution I've seen against that : have your developpers start by maintaining some old application, when they are hired -- that'll teach them that this kind of thing is not good... And they will understand why, which is the most important part.

Splitting code into several functions, re-using code the right way, and all that often come with experience -- or you have not hired the right people ;-)



回答10:

A long time ago when I used to do graphics programming you would, in some special cases, use duplicate code this way to avoid the low level JMP statements generated in the code (it would improve performance by avoiding the jump to the label/function). It was a way to optimize and do a pseudo "inlining".

However, in this case, I don't think that's why they were doing it, heh.



回答11:

If different tasks are similar by accident, repeating the same actions in two places is not necessarily duplication. If the actions in one place change, is it probable they should change in other places as well? Then this is duplication you should avoid or refactor away.

Also, sometimes - even when logic is duplicated - the cost of reducing duplication is too high. This can happen especially when it's not just code duplication: for example, if you have a record of data with certain fields that repeats itself in different places (DB table definition, C++ class, text-based input), the usual way to reduce this duplication is with code generation. This adds complexity to your solution. Almost always, this complexity pays off, but sometimes it doesn't - it's your tradeoff to make.



回答12:

I don't know of many good reasons for code duplication, but rather than jumping in feet first to refactoring, it's probably better to only refactor those bits of the code that you actually change, rather than altering a large codebase that you don't yet fully understand.



回答13:

Sounds like the original author either was inexperienced and/or was hard pressed on time. Most experienced programmers bunch together things that are reused because later there will be less maintenance - a form of laziness.

The only thing you should check is if there are any side effects, if the copied code accesses some global data a bit refactoring may be needed.

edit: back in the day when compilers were crappy and optimizers even crappier it could happen that due to some bug in the compiler one may had to do such a trick in order to get around a bug. Maybe its something like that? How old is old?



回答14:

On large projects ( those with a code-base as large as a GB ) it's quite possible to lose existing API. This is typically due to insufficient documentation, or an inability of the programmer to locate the original code; hence duplicate code.

Boils down to laziness, or poor review practice.

EDIT:

One additional possibility is that there may have been additional code in those methods which was removed along the way.

Have you looked at the revision history on the file?



回答15:

All the answers looks right, but I think there is another possibility. Maybe there are performance considerations as the things you say reminds me "inlining code". It's always faster to inline functions that to call them. Maybe the code you look at has been preprocessed first?



回答16:

I have no problems with duplicated code when it is produced by a source code generator.



回答17:

Something that we found that forced us to duplicate code was our pixel manipulation code. We work with VERY large images and the function call overhead was eating up on the order of 30% of our per-pixel time.

Duplicating the pixel manipulation code gave us 20% faster image traversal at the cost of code complexity.

This is obviously a very rare case, and in the end it bloated our source significantly (a 300 line function is now 1200 lines).



回答18:

There is no good reason for code duplication.

See the Refactor Mercilessly design pattern.

The original programmer was either in a hurry to meet a deadline or lazy. Feel free to refactor and improve the code.



回答19:

in my humble opinion there's no place for code duplication. have a look, for example, at this wikipedia article

or, let's refer to Larry Wall's citation:

"We will encourage you to develop the three great virtues of a programmer: laziness, impatience, and hubris."

it is pretty clear that code duplication has nothing to do with "laziness". haha;)



回答20:

Since there is the "Strategy Pattern", there is no valid reason for duplicate code. Not a single line of code must be duplicated, everything else is epic fail.