Creating a Brainfuck parser, whats the best method

2019-04-30 00:03发布

I'm creating a Brainfuck parser (in a BASIC dialect) ultimately to create an interpreter but i've realise it's not as straight forward as i first thought. My problem is that i need a way to accurately parse the matching loop operators within a Brainfuck program. This is an example program:

,>,>++++++++[<------<------>>-]
<<[>[>+>+<<-]>>[<<+>>-]<<<-]
>>>++++++[<++++++++>-],<.>.

'[' = start of loop

']' = end of loop

I need to record the start and end point of each matching loop operator so i can jump around the source as needed. Some loops are alone, some are nested.

What would be the best way to parse this? I was thinking maybe move through the source file creating a 2D array (or such like) recording the start and end positions of each matching operator, but this seems like a lot of 'to'ing and fro'ing' through the source. Is this the best way to do it?

More info: Brainfuck homepage

EDIT: Sample code in any language greatly appreciated.

10条回答
男人必须洒脱
2楼-- · 2019-04-30 00:14

Python 3.0 example of the stack algorithm described by the other posters:

program = """ 
,>,>++++++++[<------<------>>-]
<<[>[>+>+<<-]>>[<<+>>-]<<<-]
>>>++++++[<++++++++>-],<.>.
"""

def matching_brackets(program):
    stack = []

    for p, c in enumerate(program, start=1):
        if c == '[':
            stack.append(p)
        elif c == ']':
            yield (stack.pop(), p)

print(list(matching_brackets(''.join(program.split()))))

(Well, to be honest, this only finds matching brackets. I don't know brainf*ck, so what to do next, I have no idea.)

查看更多
冷血范
3楼-- · 2019-04-30 00:17

Each time you find a '[', push the current position (or another "marker" token or a "context") on a stack. When you come accross a ']', you're at the end of the loop, and you can pop the marker token from the stack.

Since in BF the '[' already checks for a condition and may need jump past the ']', you may want to have a flag indicating that instructions shall be skipped in the current loop context.

查看更多
劳资没心,怎么记你
4楼-- · 2019-04-30 00:18

I don't have sample code, but.

I might try using a stack, along with an algorithm like this:

  1. (executing instruction stream)
  2. Encounter a [
  3. If the pointer == 0, then keep reading until you encounter the ']', and don't execute any instructions until you reach it.. Goto step 1.
  4. If the pointer !=0, then push that position onto a stack.
  5. Continue executing instructions
  6. If you encounter a ]
  7. If pointer==0, pop the [ off of the stack, and proceed (goto step 1)
  8. If pointer != 0, peek at the top of the stack, and go to that position. (goto step 5)
查看更多
唯我独甜
5楼-- · 2019-04-30 00:24

It looks like this question has become a "post your bf interpreter" poll.

So here's mine that I just got working:

#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

void error(char *msg) {
    fprintf(stderr, "Error: %s\n", msg);
}

enum { MEMSIZE = 30000 };

char *mem;
char *ptr;
char *prog;
size_t progsize;

int init(char *progname) {
    int f,r;
    struct stat fs;
    ptr = mem = calloc(MEMSIZE, 1);
    f = open(progname, O_RDONLY);
    assert(f != -1);
    r = fstat(f, &fs);
    assert(r == 0);
    prog = mmap(NULL, progsize = fs.st_size, PROT_READ, MAP_PRIVATE, f, 0);
    assert(prog != NULL);
    return 0;
}

int findmatch(int ip, char src){
    char *p="[]";
    int dir[]= { 1, -1 };
    int i;
    int defer;
    i = strchr(p,src)-p;
    ip+=dir[i];
    for (defer=dir[i]; defer!=0; ip+=dir[i]) {
        if (ip<0||ip>=progsize) error("mismatch");
        char *q = strchr(p,prog[ip]);
        if (q) {
            int j = q-p;
            defer+=dir[j];
        }
    }
    return ip;
}

int run() {
    int ip;
    for(ip = 0; ip>=0 && ip<progsize; ip++)
        switch(prog[ip]){
        case '>': ++ptr; break;
        case '<': --ptr; break;
        case '+': ++*ptr; break;
        case '-': --*ptr; break;
        case '.': putchar(*ptr); break;
        case ',': *ptr=getchar(); break;
        case '[': /*while(*ptr){*/
                  if (!*ptr) ip=findmatch(ip,'[');
                  break;
        case ']': /*}*/
                  if (*ptr) ip=findmatch(ip,']');
                  break;
        }

    return 0;
}

int cleanup() {
    free(mem);
    ptr = NULL;
    return 0;
}

int main(int argc, char *argv[]) {
    init(argc > 1? argv[1]: NULL);
    run();
    cleanup();
    return 0;
}
查看更多
等我变得足够好
6楼-- · 2019-04-30 00:27

Interesting enough, just a couple days ago, I was writing a brainf*ck interpreter in Java.

One of the issues I was having was that the explanation of the commands at the official page was insufficient, and did not mention the part about nested loops. The Wikipedia page on Brainf*ck has a Commands subsection which describes the correct behavior.

Basically to summarize the problem, the official page says when an instruction is a [ and the current memory location is 0, then jump to the next ]. The correct behavior is to jump to the corresponding ], not the next one.

One way to achieve this behavior is to keep track of the level of nesting. I ended up implementing this by having a counter which kept track of the nesting level.

The following is part of the interpreter's main loop:

do {
  if (inst[pc] == '>') { ... }
  else if (inst[pc] == '<') { ... }
  else if (inst[pc] == '+') { ... }
  else if (inst[pc] == '-') { ... }
  else if (inst[pc] == '.') { ... }
  else if (inst[pc] == ',') { ... }
  else if (inst[pc] == '[') {
    if (memory[p] == 0) {
      int nesting = 0;

      while (true) {
        ++pc;

        if (inst[pc] == '[') {
          ++nesting;
          continue;
        } else if (nesting > 0 && inst[pc] == ']') {
          --nesting;
          continue;
        } else if (inst[pc] == ']' && nesting == 0) {
          break;
        }
      }
    }
  }
  else if (inst[pc] == ']') {
    if (memory[p] != 0) {
      int nesting = 0;

      while (true) {
        --pc;

        if (inst[pc] == ']') {
          ++nesting;
          continue;
        } else if (nesting > 0 && inst[pc] == '[') {
          --nesting;
          continue;
        } else if (inst[pc] == '[' && nesting == 0) {
          break;
        }
      }
    }
  }
} while (++pc < inst.length);

Here is the legend for the variable names:

  • memory -- the memory cells for the data.
  • p -- pointer to the current memory cell location.
  • inst -- an array holding the instructions.
  • pc -- program counter; points to the current instruction.
  • nesting -- level of the nesting of the current loop. nesting of 0 means that the current location is not in a nested loop.

Basically, when a loop opening [ is encountered, the current memory location is checked to see if the value is 0. If that is the case, a while loop is entered to jump to the corresponding ].

The way the nesting is handled is as follows:

  1. If an [ is encountered while seeking for the corresponding loop closing ], then the nesting variable is incremented by 1 in order to indicate that we have entered a nested loop.

  2. If an ] is encountered, and:

    a. If the nesting variable is greater than 0, then the nesting variable is decremented by 1 to indicate that we've left a nested loop.

    b. If the nesting variable is 0, then we know that the end of the loop has been encountered, so seeking the end of the loop in the while loop is terminated by executing a break statement.

Now, the next part is to handle the closing of the loop by ]. Similar to the opening of the loop, it will use the nesting counter in order to determine the current nesting level of the loop, and try to find the corresponding loop opening [.

This method may not be the most elegant way to do things, but it seems like it is resource-friendly because it only requires one extra variable to use as a counter for the current nesting level.

(Of course, "resource-friendly" is ignoring the fact that this interpreter was written in Java -- I just wanted to write some quick code and Java just happened to be what I wrote it in.)

查看更多
戒情不戒烟
7楼-- · 2019-04-30 00:29
package interpreter;

import java.awt.event.ActionListener;

import javax.swing.JTextPane;

public class Brainfuck {

    final int tapeSize = 0xFFFF;
    int tapePointer = 0;
    int[] tape = new int[tapeSize];
    int inputCounter = 0;

    ActionListener onUpdateTape;

    public Brainfuck(byte[] input, String code, boolean debugger,
            JTextPane output, ActionListener onUpdate) {
        onUpdateTape = onUpdate;
        if (debugger) {
            debuggerBF(input, code, output);
        } else {
            cleanBF(input, code, output);
        }
    }

    private void debuggerBF(byte[] input, String code, JTextPane output) {
        for (int i = 0; i < code.length(); i++) {
            onUpdateTape.actionPerformed(null);
            switch (code.charAt(i)) {
            case '+': {
                tape[tapePointer]++;
                break;
            }
            case '-': {
                tape[tapePointer]--;
                break;
            }
            case '<': {
                tapePointer--;
                break;
            }
            case '>': {
                tapePointer++;
                break;
            }
            case '[': {
                if (tape[tapePointer] == 0) {
                    int nesting = 0;

                    while (true) {
                        ++i;

                        if (code.charAt(i) == '[') {
                            ++nesting;
                            continue;
                        } else if (nesting > 0 && code.charAt(i) == ']') {
                            --nesting;
                            continue;
                        } else if (code.charAt(i) == ']' && nesting == 0) {
                            break;
                        }
                    }
                }
                break;
            }
            case ']': {
                if (tape[tapePointer] != 0) {
                    int nesting = 0;

                    while (true) {
                        --i;

                        if (code.charAt(i) == ']') {
                            ++nesting;
                            continue;
                        } else if (nesting > 0 && code.charAt(i) == '[') {
                            --nesting;
                            continue;
                        } else if (code.charAt(i) == '[' && nesting == 0) {
                            break;
                        }
                    }
                }
                break;
            }
            case '.': {
                output.setText(output.getText() + (char) (tape[tapePointer]));
                break;
            }
            case ',': {
                tape[tapePointer] = input[inputCounter];
                inputCounter++;
                break;
            }
            }
        }
    }

    private void cleanBF(byte[] input, String code, JTextPane output) {
        for (int i = 0; i < code.length(); i++) {
            onUpdateTape.actionPerformed(null);
            switch (code.charAt(i)) {
            case '+':{
                tape[tapePointer]++;
                break;
            }
            case '-':{
                tape[tapePointer]--;
                break;
            }
            case '<':{
                tapePointer--;
                break;
            }
            case '>':{
                tapePointer++;
                break;
            }
            case '[': {
                if (tape[tapePointer] == 0) {
                    int nesting = 0;

                    while (true) {
                        ++i;

                        if (code.charAt(i) == '[') {
                            ++nesting;
                            continue;
                        } else if (nesting > 0 && code.charAt(i) == ']') {
                            --nesting;
                            continue;
                        } else if (code.charAt(i) == ']' && nesting == 0) {
                            break;
                        }
                    }
                }
                break;
            }
            case ']': {
                if (tape[tapePointer] != 0) {
                    int nesting = 0;

                    while (true) {
                        --i;

                        if (code.charAt(i) == ']') {
                            ++nesting;
                            continue;
                        } else if (nesting > 0 && code.charAt(i) == '[') {
                            --nesting;
                            continue;
                        } else if (code.charAt(i) == '[' && nesting == 0) {
                            break;
                        }
                    }
                }
                break;
            }
            case '.':{
                output.setText(output.getText()+(char)(tape[tapePointer]));
                break;
            }
            case ',':{
                tape[tapePointer] = input[inputCounter];
                inputCounter++;
                break;
            }
            }
        }
    }

    public int[] getTape() {
        return tape;
    }

    public void setTape(int[] tape) {
        this.tape = tape;
    }

    public void editTapeValue(int counter, int value) {
        this.tape[counter] = value;
    }

}

This should work. You need to modify it somewhat. That is actually standard example how brainfuck interpreter works. I modified it to use in my app, brackets are handled there:

case '[': {
    if (tape[tapePointer] == 0) {
        int nesting = 0;

        while (true) {
            ++i;

            if (code.charAt(i) == '[') {
                ++nesting;
                continue;
            }
            else if (nesting > 0 && code.charAt(i) == ']') {
                --nesting;
                continue;
            }
            else if (code.charAt(i) == ']' && nesting == 0) {
                break;
            }
        }
    }
    break;
}
case ']': {
    if (tape[tapePointer] != 0) {
        int nesting = 0;

        while (true) {
            --i;

            if (code.charAt(i) == ']') {
                ++nesting;
                continue;
            }
            else if (nesting > 0 && code.charAt(i) == '[') {
                --nesting;
                continue;
            }
            else if (code.charAt(i) == '[' && nesting == 0) {
                break;
            }
        }
    }
    break;
}
查看更多
登录 后发表回答