I'm having a hard time trying to refactor some pieces of code with many branches. There are many if/then/else blocks, some of them are nested.
Are there any tricks that can be used to refactor the code without wasting a lot of time trying to understand every minor aspect of the functionality first?
For now I'm basically using boolean algebra (De Morgan's laws). I'm trying to modify the conditions in if statements so I could pop up small code snippets outside if/then/else blocks. This helps somehow, but the code still remains branched. I know at some point I will eventually have to break the biggies into smaller class methods, but this is complicated because the code contains calls to many other class methods and has a lot of local-scope variables, so I will have to deal with passing many arguments to new methods. I wonder If I could use some other tricks to improve the code quality before I start breaking it into distinct smaller pieces (class methods).
In my experience, it's almost always more productive to functionally decompose (break things into smaller functions that do one thing and do it well) before getting into branches and equivalent logical expressions.
Among other things, the act of functionally decomposing will lead to your understanding what the code does, which leads to further chances to decompose. You want to get to functions that take only a few arguments, and return no, one or few values.
And by decomposing, you'll gain a better understanding of what parts are algorithmically essential (it must be done this way to be correct) and what parts are merely implementation details (which must be done, but can be done in various different ways without changing the output).
Often, you'll find that many of those deeply nested
if
s are either unnecessary, or are repeated redundantly, or are similar enough they can be reduced to one or a few functions if you factor out minor differences in otherwise repeated code.In other words, you want to get a handle not on the minutia of the code, but the abstract thing the code is dealing with. Many many problems boil down to some data structure (a list, a queue, a tree, a set) and some operation(s) on that structure. To the extent that you can tease apart data structures and algorithms, you can gain a level of abstraction that simplifies things. Or you can discover that whatever you have to do is better suited to other structures or algorithms, and then you can blow away a lot of cruft.
I remember years ago a colleague who always wrote database cursors, to do anything, because that's all he knew how to do. Setting up and tearing down a cursor requires some boilerplate, and a loop with some error checking, so his code always looked superficially complicated. He'd do it in a stored proc, which of course added more boilerplate.
Now, as it happened, this was in Sybase T-SQL, which has a global for errors and a global for cursor status, both of which he'd check twice, once when he got the first cursor row, then once in a loop that iterated over all the other rows. And he consistently confused the error variable (
@@error
) with the cursor state variable (@@sqlstatus
, which is zero on success (got a cursor row), 1 on failure, and 2 if there are no more rows in the cursor). His code worked on the nominal path, because both were zero if he'd gotten a row, and when he tried to get the row after the last one, he'd get an error. So if you looked closely at his code, you'd groan (once again!) and fix that for him.But then you'd look closer, and see that he was doing something like cursoring through the set of all rows
where x = 1
and for each row settingy = y * 2
, and you'd end up telling him, "just write an update statement!".Your correcting of his checking of globals, while correct, didn't address the real problem, which was that he was using cursor and a stored proc for no good reason, when he just could have sent an update statement from the client code.
Figuring that out was more difficult, because instead of just looking at the local use of the globals, you had to look two places: where the cursor was declared (
declare cursor_foo cursor for select * from table where x = 1 for update;
) and twenty lines down where the update happened (update table set y = y * 2 where current of cursor_foo
). (And all of these would be multi-line in a very boilerplate way.)Figuring it out was also more difficult because you just assumed no one would go through all this just to do an update; surely all this boilerplate had to be because something special was happening in the update, right? Or because the
where
predicate was dynamic or something? So you'd look at this, and being a modest coder who respects his colleagues, your first instinct would be, "what am I missing here, there must be a reason a cursor was used?"(Despite both me and his boss explaining the difference between @@error and @@sqlstatus, he just never got it, much less the idea that he could almost always just do an
update
; he thought in terms of procedural code, and never "got" databases, despite ostensible experience as a "database programmer".)The lesson is to not get caught up in minutia until you have confirmed there's a need for the minutia (the implementation details) in the first place. By decomposing your code first, you have a better chance of understanding the abstractions you're dealing with, and really improving the code.
Does the code currently have sufficient unit tests so that you will know if your change breaks something or not? If not, write them first before doing any refactoring. Refactoring with out unit tests is a little bit like stepping out into the street without looking first.
If the code has unit tests, you can lean on them for understanding. If not, then you're going to have to understand the code well enough to write the unit tests. This probably does not mean that you need to get to the level of understanding the minutiae. But you will have to understand the expectations under various different conditions. You need to know the nominal path through the code. You need to understand the various alternative and error paths.
I honestly don't see a circumstance under which you won't become intimately familiar with this code -- or at least the intent behind it. Once you get to that level, hopefully, you'll be able to not only replace the code bits but actually design a better solution.
Good luck.
Before I start to tear in the code I create a unit test framework to make sure that I don't break anything. Then I start moving all the cases into separate functions one by one. The code suddenly becomes much more readable since I try to give the functions understandable names which also forces me to understand the code. Then once that is done I start looking at the conditions to see if they can be re-factored as well, it often possible since the sheer amount of lines before may have obscured things. (I am think about a function I inherited from somebody which was basically 500 lines of if-then-else)
Figure out if your branching is "table" based or functionality based.
Certain things (e.g., parsers) come down to what is essentially one big switch statement (with mild adjustments for sparseness), in which case one huge switch statement may be the most readable option.
Otherwise, try to think functionally. Once you've decomposed and decomposed into function calls, you have the building blocks to rearrange things if necessary, but you may find that it's no longer necessary. This is because the function names you'll choose should follow the "Do one thing" principle, so you're already thinking when you refactor into these functions.