I think back to Joel Spolsky's article about never rewriting code from scratch. To sum up his argument: The code doesn't get rusty, and while it may not look pretty after many maintenance releases, if it works, it works. The end user doesn't care how pretty the code is.
You can read the article here: Things You Should Never Do
I've recently taken over a project and after looking through their code, it's pretty awful. I immediately thought of prototypes I had built before, and explicitly stated that it should not be used for any production environment. But of course, people don't listen.
The code is built as a website, has no separation of concerns, no unit testing, and code duplication everywhere. No Data layer, no real business logic, unless you count a bunch of classes in App_Code.
I've made the recommendation to the stake holders that, while we should keep the existing code, and do bug fix releases, and some minor feature releases, we should start rewriting it immediately with Test Driven Development in mind and with clear separation of concerns. I'm thinking of going the ASP.NET MVC route.
My only concern is of course, the length of time it might take to rewrite from scratch. It's not entirely complicated, pretty run of the mill web application with membership, etc..
Have any of you come across a similar problem? Any particular steps you took?
UPDATE:
So.. What did I end up deciding to do? I took Matt's approach and decided to refactor many areas.
- Since App_Code was getting rather large and thus slowing down the build time, I removed many of the classes and converted them into a Class Library.
I created a very simple Data Access Layer, which contained all of the ADO calls, and created a SqlHelper object to execute these calls.
I implemented a cleaner logging
solution, which is much more concise.
While I no longer work on this project [funding, politics, blah blah], I think it gave me some enormous insight into how bad some projects can be written, and steps one developer can take to make things a lot cleaner, readable and just flat out better with small, incremental steps over time.
I disagree with that article somewhat. For the most part Joel is correct but there are counter-examples that indicate sometimes (even if rarely) a rewrite is a good idea. E.g.,
I believe Joel's argument is mainly based on fairly well-written code in the existing version that could be improved with hindsight. By all means, if the code you inherited is really that bad, push for a rewrite--there's some scary stuff out there. If it's at all tolerable and works reasonably well, phase in the new stuff at a slower pace.
Instead of a complete rewrite from scratch you want to start refactoring the code base in small steps while introducing unit tests. For example
There's an old adage that says:
The key to knowing when to re-write lies in there. Does the system currently does what you want? If the answer is yes, slow, but steady improvements are your best bet. If the answer is no, a re-write is what you want.
Going back to Joel's essay, he talks about code that's messy, but software that is reliable and delivers the expected value. If instead, you have unreliable code full of major bugs and that wasn't covering all your use cases. You had things that were supposed to be there yet don't work, or are just missing. In this case, all the little hairs growing out of it aren't bug fixes, but cancer.
There is also a conflicting statement in economics that says,
Sunk costs, according to Wikipedia (https://en.wikipedia.org/wiki/Sunk_cost):
When sunk costs are coupled with political pressure or personal ego (what manager wants to be the one to admit that they made a poor decision or didn't properly monitor results, even if it was unavoidable or out of their immediate control?), it leads to a situation called escalation of commitment (https://en.wikipedia.org/wiki/Escalation_of_commitment), which is defined as:
How does this apply to code?
Having a rather long career as a software developer now, one common thread I've found is that, when faced with a challenging or ugly codebase (even if it is our own from two years ago), our first instinct is to want to throw out the old, ugly code and rewrite it from scratch. If it is a familiar codebase, then this is usually born from the fact that we are now much more familiar with the pitfalls of the project and business requirements than we were when we started the project, so we (perhaps subconsciously) yearn for the opportunity to fix our past sins by erasing them with perfection. If it is an unfamiliar codebase, we often tend to over-simplify the challenges faced by the original developers, glossing over "minor details" in favor of "big-picture" architectural-level thinking, and often blowing budgets and timeframes due to a lack of understanding of the complex minutia of the business cases that the code was originally meant to solve.
Then there is the whole concept of technical debt, which, just like financial debt, CAN and WILL accrue to the point that a codebase becomes technically insolvent. More and more time and resources are invested into troubleshooting bugs, extinguishing fires, and overly-challenging improvements to an extent that forward progress becomes expensive, difficult, and perilous. Projects take longer and longer due to defects and being pulled off of project work to fix production issues. After hours "incidents" start becoming expected operation instead of some rare blip. Instead of stepping back and starting to do things right to increase our future productivity (and quality of life), we find ourselves in a position where we are forced to add more and more technical debt in order to meet deadlines - the technical equivalent to taking cash advances on a credit card to make a minimum payment on another card.
That all being said, it neither means that we should rewrite whenever possible, nor should we avoid rewriting working code at all costs. Both extremes are potentially wasteful, and the latter does tend to lead to escalation of commitment (because at all costs means with total disregard to costs, even if those costs completely outstrip the benefits). What needs to occur is an objective assessment of the costs and benefits of rewriting code versus making incremental improvements. The challenge is finding someone with both the expertise and objectivity to make that decision properly. For us developers, we are generally biased towards rewriting because it tends to be a lot more interesting and engaging than working on some crappy legacy codebase. Business managers tend to be biased the other direction because a rewrite imposes some unknowns with little perceivable immediate benefit. The result is generally the absence of a real decision, which then defaults to continuing to dump hours into existing code until some circumstance necessitates a directional shift (or the developer covertly rewrites the code, and usually gets a spanking for it).
I've worked on codebases that were somewhat salvageable, albeit ugly. They didn't follow established practices or standards, didn't use patterns, weren't pretty, but they performed their intended functions reasonably well and were flexible enough that they could be modified to meet anticipated future needs for the expected life of the application. While not glamorous, it was perfectly acceptable to keep this code alive while making incremental improvements when the opportunity arose. Doing otherwise would have produced little benefit other than looking pretty. I would say that most code about which the should I rewrite this? question arises falls under this category, and I find myself explaining to the junior developers on the team that, while it would be great fun to rewrite YetAnotherLineOfBusinessApp in {insert whizzbang framework here}, it is neither necessary or desirable, and here are some ways we can improve it...
I've also worked on codebases that were hopeless. These were applications that barely launched in the first place, usually way behind schedule and in a reduced-functionality state. They were written in a way that no one but the original developer would have any chance of understanding what the code ultimately does. I refer to this as "read-only" code. Once it is written, any attempted change potentially results in systemic indecipherable failure of unknown origin, leading to panicked wholesale rewrites of massive monolithic code constructs that serve no purpose other than to educate the current developer on what is actually happening to a variable cleverly named
obj_85
by the time execution reaches line 1,209 nested 7 levels deep inif... else...
,switch
, andforeach...
statements somewhere in theDoEverythingAndMakeCoffee(...)
method. Attempts to refactor this code results in failure. Every path you follow leads to another challenge, and more paths, and then paths that branch, and then circle back to a previous path, and after two weeks of heads-down refactoring of a single class you realize that, while maybe better encapsulated, the new code is nearly as whacky and obfuscated as the old code, probably contains even more bugs because the original intent of what you refactored was totally unclear, and, not knowing what exact business cases led to the original disaster in the first place, you can't be sure you've fully replicated the functionality. Progress is almost non-existent because translation of the codebase is nearly impossible and something so innocent is renaming a variable or using the proper type produces an exponential amount of unintended side effects.Attempting to improve codebases like the above is an exercise in futility. Refactoring usually results in a 80% rewrite anyways, and the end result is nowhere near an 80% improvement. You end up with something that is very inconsistent, and the new code has a lot of compromises that had to be implemented in the interest of interoperability with legacy code (half of which was unnecessary because the legacy code that the new code needed to interoperate with later gets refactored out anyways). There are only two paths that can be followed... continue to accrue technical debt by hacking in "fixes" and modifications while hoping that the application is deprecated (or you get transferred to another project) before it collapses under its own weight, or someone makes the business decision and takes the risk of doing a complete rewrite. I hate both of these options, because it usually means waiting until something critical has failed or a project is way behind schedule, and you then spend the next three months of evenings and weekends trying to get something breathing that probably never should have been alive in the first place.
So, how do you decide?
The book Facts and Fallacies Of Software Engineering states this fact: "Modification of reused code is particularly error-prone. If more than 20 to 25 percent of a component is to be revised, it is more efficient and effective to rewrite it from scratch." The numbers come from some statistical studies performed on the subject. I think the numbers may vary due to the quality of the code base, so in your case, it seems to be more efficient and effective to rewrite it from scratch by taking this statement into account.
I have had such an application, and rewrite was very rewarding. However, you should try to aviod the "improvement" trap.
When you rewrite everything, it is very tempting to add new features and fix some long-standing issues you didn't have the guts to touch. This can lead to feature creep and also extend the time needed for rewrite enormously.
Make sure you decide what exactly will be changed and what will only be rewritten - in advance.