Automated GOTO removal algorithm

2019-03-12 22:29发布

I've heard that it's been proven theoretically possible to express any control flow in a Turing-complete language using only structured programming constructs, (conditionals, loops and loop-breaks, and subroutine calls,) without any arbitrary GOTO statements. Is there any way to use that theory to automate refactoring of code that contains GOTOs into code that does not?

Let's say I have an arbitrary single subroutine in a simple imperative language, such as C or Pascal. I also have a parser that can verify that this subroutine is valid, and produce an Abstract Syntax Tree from it. But the code contains GOTOs and Labels, which could jump forwards or backwards to any arbitrary point, including into or out of conditional or loop blocks, but not outside of the subroutine itself.

Is there an algorithm that could take this AST and rework it into new code which is semantically identical, but does not contain any Labels or GOTO statements?

3条回答
smile是对你的礼貌
2楼-- · 2019-03-12 22:52

I think you want to read Taming Control Flow by Erosa and Hendren, 1994. (Earlier link on Google scholar).

By the way, loop-breaks are also easy to eliminate. There is a simple mechanical procedure involving the creating of a boolean state variable and the restructuring of nested conditionals to create straight-line control flow. It does not produce pretty code :)

If your target language has tail-call optimization (and, ideally, inlining), you can mechanically remove both break and continue by turning the loop into a tail-recursive function. (If the index variable is modified by the loop body, you need to work harder at this. I'll just show the simplest case.) Here's the transformation of a simple loop:

for (Type Index = Start;        function loop(Index: Type):    
     Condition(Index);              if (Condition)
     Index = Advance(Index)){           return                      // break
   Body                             Body
}                                   return loop(Advance(Index))     // continue
                                loop(Start)

The return statements labeled "continue" and "break" are precisely the transformation of continue and break. Indeed, the first step in the procedure might have been to rewrite the loop into its equivalent form in the original language:

{
    Type Index = Start;
    while (true) {
        if (!Condition(Index))
            break;
        Body;
        continue;
    }
}
查看更多
劫难
3楼-- · 2019-03-12 22:52

In principle, it is always possible to do this, though the results might not be pretty.

One way to always eliminate gotos is to transform the program in the following way. Start off by numbering all the instructions in the original program. For example, given this program:

start:
    while (true) {
        if (x < 5) goto start;
        x++
    }

You could number the statements like this:

0 start:
1     while (x < 3) {
2         if (x < 5) goto start;
3         x++
      }

To eliminate all gotos, you can simulate the flow of the control through this function by using a while loop, an explicit variable holding the program counter, and a bunch of if statements. For example, you might translate the above code like this:

int PC = 0;
while (PC <= 3) {
    if (PC == 0) {
         PC = 1;             // Label has no effect
    } else if (PC == 1) {
         if (x < 3) PC = 4;  // Skip loop, which ends this function.
         else PC = 2;        // Enter loop.
    } else if (PC == 2) {
         if (x < 5) PC = 0;  // Simulate goto
         else PC = 3;        // Simulate if-statement fall-through
    } else if (PC == 3) {
         x++;
         PC = 1;             // Simulate jump back up to the top of the loop.
    }
}

This is a really, really bad way to do the translation, but it shows that in theory it is always possible to do this. Actually implementing this would be very messy - you'd probably number the basic blocks of the function, then generate code that puts the basic blocks into a loop, tracks which basic block is currently executing, then simulates the effect of running a basic block and the transition from that basic block to the appropriate next basic block.

Hope this helps!

查看更多
女痞
4楼-- · 2019-03-12 23:02

I use either/both Polyhedron's spag and vast's 77to90 to begin the process of refactoring fortran and then converting it to matlab source. However, these tools always leave 1/4 to 1/2 of the goto's in the program.

I wrote up a goto remover which accomplishes something similar to what you were describing: it takes fortran code and refactors all the remaining goto's from a program and replacing them with conditionals and do/cycle/exit's which can then be converted into other languages like matlab. You can read more about the process I use here:

http://engineering.dartmouth.edu/~d30574x/consulting/consulting_gotorefactor.html

This program could be adapted to work with other languages, but I have not gotten than far yet.

查看更多
登录 后发表回答