I'm going to refactor certain parts in a huge code base (18000+ Java classes). Goal is to be able to extract lower layers as independent libraries to be reused in other projects that currently use duplicate of this code base. Especially one part is of interest to be refactored into a framework independent of business logic. Ultimately I would like the code to have a clean architectural layering.
I've looked at the code with a tool called Structure 101 for java and found lots (!) of architectural layering issues where lower layers are referencing upper layers.
I don't want to simply start messing with the code but try to come up with a reasonable strategy to go about this problem. What things should I keep in mind?
I'm thinking about at least taking small steps. I'm also thinking about have unit tests in place, but that requires creating them, since there are none.
Any thoughts on this?
My idea is that after setting up the testing infrastructure, you can write code generation tools for test cases if abstraction can be made out of common features of your testing code, maybe static code analysis tools could be add-ons besides the visualizing tools. Sorry, it's a idea. I can't even name the tools.
You should also take a look at Working with legacy code by Michael Feathers:
http://www.amazon.com/Working-Effectively-Legacy-Robert-Martin/dp/0131177052/ref=sr_1_1?ie=UTF8&s=books&qid=1242430219&sr=8-1
I think one of the most important things you can put in place to facilitate this are tests to ensure that everything still works post refactoring/pulling out into separate modules. Add to this by introducing a continuous integration system that runs your tests when you check something in.
First thing: good luck, you're going to need it. This is potentially a HUGE job you've come upon. It sounds very familiar to me; I've worked on similar things in the past.
One thing to think about; before you start refactoring at all, I'd really strongly consider putting in place an extensive testing framework. The reason is this: with good unit tests and regression tests, you can begin making changes without worrying TOO much about breaking existing functionality. (That said, there's always a concern, but...)
That said: I'd look at slicing off distinct "vertical" slices of functionality, and see if you can write distinct unit and integration tests for them; once that is done, I'd jump in and start work on the refactor. While it may be very small at first, just the process of isolating the vertical slice of functionality and then writing integration and unit test code for it will get you a lot of experience with the existing code base. And if you manage to make that one little bit better initially, then you're ahead by that much.
After you've done that, start looking at potentially larger blocks of functionality to refactor. If it isn't possible to get clean blocks of functionality to refactor, I'd start looking at small chunks; if you can find a small (sometimes VERY small) chunk of code to then extract, unit test, and refactor, you're moving forward. This may seem like very very very slow progress at times, and it will, if you have a really large project, but you WILL be making a dent.
But in general, think of putting in place tests first to confirm expected functionality. Once those tests are in place, you can refactor with confidence (not perfect confidence, but better than nothing) that you aren't breaking things. Start small, and build on the techniques that reveal themselves out of the existing codebase. It's a long slog, but you'll get there eventually, and the codebase will be better for it.