How to implement LOOP in a FORTH-like language int

I'm writing a simple stack-based language in C and was wondering how I should go about implementing a loop structure of some kind, and/or lookahead symbols. Since the code is a bit long for this page (over 200 lines) I've put it in a GitHub repository.

EDIT: The main program is in file stack.c.

EDIT: The code just takes in input in words, kind of like FORTH. It uses scanf and works left to right. Then it uses a series of ifs and strcmps to decide what to do. That's really it.

回答1:

The Forth approach is to add a separate loop stack alongside the data stack. You then define operations that work with this loop stack. For example:

5 0 DO I . LOOP

Will print

0 1 2 3 4

The way this works is:

DO moves the index (0) and the control (5) over to the loop stack.
I copies the top of the loop stack to the data stack.
LOOP increments the index (top of loop stack). If the index is less than the control (one below the top of loop stack), then it reruns the commands from DO back to LOOP. If the index is >=, then it pops the index and control from the loop stack, and control resumes as normal.

回答2:

The obvious way to add control structures to a stack-based language is to add a "control stack". I'll describe the Postscript mechanism because that's what I know, but Forth has some analogous behavior (with subtle differences, to be sure).

The simplest controlled loop is the repeat loop. Technically, an infinite loop is simpler, but to use it requires an explicit exit command, and explaining that would complicate things.

The syntax for repeat is:

int proc  **repeat**  -

So when the repeat command begins it expects to find a procedure on the top of the operand stack (this is the loop body to be executed), and an integer just below the procedure (the number of times to execute the procedure).

Since there is also an execution stack (I think Forth calls it the "return" stack), a command can "execute" other commands by pushing them on the execution stack, thus scheduling other commands to be executed immediately after the current command is finished, but before resuming the caller of the current command. That's a long sentence, but the key idea is there.

As a concrete example, assume the interpreter is executing from the input stream. And the stacks look like this:

operand: -empty-
execute: <stdin>

The user types 2 {(Hello World!\n) print} repeat.

The integer 2 gets pushed on the stack.

operand: 2
execute: <stdin>

The quoted(*) procedure body gets pushed on the stack.

operand: 2 {(Hello World!\n) print}
execute: <stdin>

The repeat command gets executed.

operand: 2 {(Hello World!\n) print}
execute: <stdin> repeat

Repeat: expects operands: int proc
    if int<0 Error
    if int==0 return //do nothing
    push `repeat` itself on exec stack
    push quoted proc on exec stack
    push int-1 on exec stack
    push "executable" proc on exec stack

Executing the repeat procedure (once) leaves the stacks like this:

operand: -empty-
execute: <stdin> repeat {(Hello World!\n) print} 1 **{(Hello World!\n) print}**

The interpreter executes the procedure on top of the exec stack, printing "Hello World!", then finds an integer, which it pushes on the op stack.

operand: 1
execute: <stdin> repeat {(Hello World!\n) print}

The interpreter executes a quoted procedure by pushing on the op stack.

operand: 1 {(Hello World!\n) print}
execute: <stdin> repeat

And we're back at the beginning! Ready for the next iteration (or termination if the integer has fallen to zero).

Hope this helps.

Edit;

After actually looking at your code, I have a different suggestion, perhaps a springboard toward something like I describe above. I think you should forget the code for a while and focus on data structures. You've got a nice table for variables, but all the commands are embedded literals scattered through the code!

If you make your table hold variable record types, you can use the same lookup mechanism for both (and even move up to a hash or a ternary search tree (my current favorite)). I have in mind something like this:

struct object {
    int type;
    union {
        int i;
        void (*command)(context *);
    } u;
};

struct dict {
    int sz;
    struct rec {
        char *key;
        object val;
    }  data[]; //think resizable!
};

This way each command goes in its own function. The bonus is: small functions. You should try to make your functions small enough that you can see the whole thing on the screen at the same time. Scanning the whole thing at once allows your right brain to do some of the work of detecting problem areas.

Then you'll need another type to hold sequences of commands. An array or list should work. When you can define sequences, you can repeat sequences much more easily.

The bonus here is you're only one construct away from Turing Complete. Sequences, Iterations, Decisions (or Selection): that's all you need to program any computable function!

*. As discovered by the commentator, Postscript does not actually have the same concept of quoting that I use here in my description. Here I'm borrowing the concept from Lisp terminology. Postscript has instead a literal/executable flag that can be set cvx cvlit or tested xcheck. An executable array on the execution stack will be executed. So for the "quoted" procedure-body, it's actually a literal procedure-body (ie. an array). Because of this, repeat must also push a call to cvx to be executed before the next iteration. So, a more correct pseudocode for the postscript implementation of repeat is:

Repeat: expects operands: int proc
    if int<0 Error
    if int==0 return //do nothing
    push `repeat` itself on exec stack
    push 'cvx' on the exec stack
    push cvlit(proc) on exec stack
    push int-1 on exec stack
    push "executable" proc on exec stack

This performs the necessary flag-twiddling to pass the procedure as data on the execution stack.

Another way that I've seen this control structure implemented is with two functions, repeat the same entry-point, and an internal continuation operator which in theory doesn't need a name. I think ghostscript calls it something like @repeat-continue@. With a separate continue function, you don't have to be quite so careful with the order of stuff on the stack, and you don't need to twiddle the literal flag. Instead, you can store some persistent data below the recursive call on the exec stack; but you have to clean it up too.

So an alternate repeat would be:

int proc  **repeat**  -
    if int<0 Error
    if int==0 return //do nothing
    push null on exec stack   <--- this marks our "frame"
    push int-1 on exec stack
    push proc on exec stack
    push '@repeat-continue' on exec stack
    push executable proc on exec stack

The the continuation also has a simpler job.

@repeat-continue
    peek proc from exec stack
    peek int from exec stack
    if int==0 clear-to-null and return
    push '@repeat-continue' on exec stack
    push executable proc on exec stack

Finally, the exit operator is intimately connected to loops, it clears the exec stack up to the "frame" of the looping operator. For the 2-function style, this is a null object or similar. For the 1-function style, this is the looping operator itself.

回答3:

Your language isn't at all Forth-like, so don't copy Forth's (compilation-only - a meaningless distinction for your language) loop structures.

Looking at your code, add until as a conditional restart-evaluation word. That is, add it as a normal word alongside your range and jump, have it pop the stack, and if the top of the stack was true, set stack.c's cmd back to the beginning.

0
dup . 1 + dup 5 > until
.

, in your language, will produce the output 0 1 2 3 4 5 6, across three evaluation, and re-evaluating the second line several times. This assumes that you preserve state across evaluations, and that evaluation is line-oriented. You might mine LSE64 for more ideas like this.

回答4:

Here's a blog post in which DO/LOOP, BEGIN/UNTIL, WHILE/REPEAT, etc. are implimented in my little TransForth project: http://blogs.msdn.com/b/ashleyf/archive/2011/02/06/loopty-do-i-loop.aspx

I've since changed my mind however and rely entirely on recursion with no such looping words.

Hope that helps, have fun!