I am a junior software engineer who've been given a task to take over a old system. This system has several problems, based on my preliminary assessment.
- spaghetti code
- repetitive code
- classes with 10k lines and above
- misuse and over-logging using log4j
- bad database table design
- Missing source control -> I have setup Subversion for this
- Missing documents -> I have no idea of the business rule, except to read the codes
How should I go about it to enhance the quality of the system and resolve such issues? I can think of using static code analysis software to resolve any bad coding practice.
However, it can't detect any bad design issues or problems. How should I go about resolving these issues step by step?
Your issue #7 is by far the most important. As long as you have no idea how the system is supposed to behave, all technical considerations are secondary. Everyone is suggesting unit tests - but how can you write a useful test if you can't distinguish between wanted and unwanted behaviour?
So before you start touching the code, you have to understand the system from the user's point of view: talk to users, observe them using the system, write documentation on the use case level.
Yes, I am seriously suggesting that you spend days, more likely weeks, without changing a single line of code. Because right now, any change you make is likely to break things without you realizing it.
Once you understand the app, you'll at least know which functionality is important to test (manually or automated).
Write some unit tests first, and make sure they pass. Then with each refactoring change you make, just keep making sure the tests keep passing. Then you can be confident that your application behaviour to the outside world hasn't changed.
This also has the added benefit that the tests will always be there, so for any future changes the tests should still pass, guarding against any regressions in the new changes.
Get and read Working Effectively With Legacy Code. It deals exactly with this situation.
As others have also advised, for refactoring you need a solid set of unit tests. However, legacy code is typically very difficult to unit test as is, since it has not been written to be unit testable. So you need to refactor first to allow unit testing, which would allow you to start refactoring... a bad catch.
This is where the book will help you. It gives lots of practical advice on how to make badly designed code unit testable with the minimal, and safest possible, code changes. Automatic refactorings can also help you here, but there are tricks described in the book which can only be done by hand. Then once the first set of unit tests are in place, you can start gradually refactoring towards better, more maintainable code.
Update: For hints on how to take over legacy code, you may find this earlier answer of mine useful.
As @Alex noted, unit tests are also very useful to understand and document the actual behaviour of the code. This is especially useful when documentation about the system is nonexistent or outdated.
Design issues are very difficult to catch. The first place to start is understanding the design of the application. I find it useful to diagram using either UML or a process flow diagram, anything works that communicates the design and working for the application.
From there I go into more detail, and ask myself the questions "Would I have done it this way", what other options are there. It is easy to see code-debt, i.e. the debt that we get from making bad choices, as always bad, but sometimes there are other factors involved like budget, time, availability of resources etc. Their you have to ask the question if it is worth refactoring a working but bad designed application.
If there are many upcoming new features, changes, bug fixes, etc I would say it is good to refactor, but if the application rarely changes and is stable, then maybe leaving it as is is a better approach.
Another sidepoint to note, is that if the code is used by another application as a service or module, then refactoring might first mean create a stub around the code that servers as the interfaces, once that is defined clearly and has unit test to prove it work. You can choose any technology to fill in the details.
Working Effectively With Legacy Code might be helpful.
As others have noted, don't change something that works just to make it prettier. The risk that you will introduce errors is great.
My philosophy is: As I have to make changes to satisfy new requirements or to fix reported bugs, I try to make the piece of code that I have to change a little cleaner. I'm going to have to test the changed code anyway, so now is a good time to do a little clean-up at small additional cost.
Fundamental design changes are the toughest and must be saved for occasions where you have to make a big enough change that you would be testing all the changed code anyway.
Changing bad database design is hardest of all because the poorly designed tables are likely used by many programs. Any change to the database requires changing every program that reads or writes it. The best way to accomplish this is usually to try to reduce the number of places that access any given part of the database. To take a simple example: Suppose there are 20 places that read through customer records and calculate the customer account balance. Replace this with one function that reads the database and returns the total, and twenty calls to that function. Now you can change the schema for the customer records and there is only one piece of code to change instead of 20. The principle is simple enough, but in practice it is unlikely that every function that accesses a given record is doing the same thing. Even if the original programmer was clumsy enough to write the same code 20 times (not unlikely -- I've seen plenty of that), the real situation is probably not that he wrote 1 function 20 times, period, but that he wrote function A 20 times, function B 12 times, function C 4 times, etc.