How fanatic are you about elimination of duplicate code?
Personally, whenever I see duplicate code, either in testing code or production, I tend to refactor the duplication away. The only exception I make are these:
- Sometimes the reduction in duplication is very minimal, because the newly refactored method have too many parameters to be actually useful / readable.
- Sometimes, in test code, when several tests use the same piece of code that's not really a coherent flow I leave the duplication alone (but not always - depending on the dup size).
We work at it. It really helps to have a tool that detects such duplication; regardless of best intentions, it happens because one doesn't think, or there's a tight schedule, etc.
The CloneDR finds duplicate code, both exact copies and near-misses, across large source systems, parameterized by langauge syntax. It supports Java, C#, COBOL, C++, PHP and many other languages. We use it ourselves to help manage our own code.
Since duplication lends to copy-paste I always try to avoid duplication or refactor where there is already duplication in existing code.
I was pretty fanatical to begin with, but a recent experience has probably made me more so, and given me another set of tools to use. Specifically, algorithms/concepts from bio-informatics. In a new position we are updating the web UI to use CSS driven layout instead of tables, so I am analyzing 700 existing JSP files. I put all the lines of code into a database and of 100K total lines, fewer than 20K were unique. Then I represent each file as a sequence of line ids, and find the common subsequences of 2 or more lines; the longest was almost 300 lines duplicated between a couple of JSP files, and egregious case of cut and past. That's where I stand now, but my next plan is to re-represent the files as a sequence of line_id's OR (common) subsequence_id's, sort them, and then perform a Levenshtein comparison of files contiguous to one another within the sort order. This should help in fuzzy matching of files that not just contain common subsequences, but subsequences that are off by one and such.
I used to be pretty laissez faire about it - obviously try to avoid duplication where ever possible, but if you need to just copy the occasional 15 lines of code from here to there to save an afternoon's refactoring, that's probably okay as long as you don't make a habit of it.
Then I started my current job.
The guy who wrote most of this codebase before me took the "premature optimization is the root of all evil" line to it's ludicrous extreme. Example: there were at least five different places in the app that computed the size for a thumbnail of an uploaded graphic. This seemed like the kind of thing I could rationalize, until I realized that thumbnails from all 5 "paths" were being displayed on the same screen - and each function was doing the math a slightly different way, and getting slightly different results. They had all started as copy-pastes, but had each been hot-rodded over a year or so until we got to where I found it.
So, THAT all got refactored. And now I'm a dupe-removal fanatic.
I consider it the single most important indicator of a good programmer. If you can write fully factored code--then almost by definition it's good code.
It seems like nearly all other programming practices are just ways to get your code DRY.
This is a bit of an overstatement, but not too much. Between being DRY and making your interfaces as stable and minimal as possible (Separation of Concerns) you're on your way to being an actual Software Engineer and not a Programmer/Hacker..
I always try to first think why this code is duplicated. Most of the time, the answer is lazyness/ignorance/etc, and I refactor. But, just once in a while, the case where the duplication is actually valid shows up. What I am talking about here is two pieces of code that are semantically unrelated, but just happen to have the same (or similar) implementation at the moment. Consider for instance business rules for completely unrelated (actual) processes. The rules may be equal one day, and then the next day one of them changes. You better hope that they are not represented by the same piece of code, or pray that the developer doing the modification can spot what is going on (unit tests, anyone?)).