Code generator for expressions using Sethi-Ullman

2019-04-09 13:40发布

Give a AST tree, I want to generate an assembly-like language. I'm trying using Sethi-Ullman algorithm but I have some questions in the algorithm implemetation.

What should I do when I run out of registers?

currently I do the following:

Emit a push REG where REG is the register of right subtree, evaluate left subtree, get one free register assign as register of right subtree and then emit a POP REG operation where REG is the register of right subtree too.

How should I implement the function to get a free register? currently I'm using an implementation like this instead of a stack-based:

enum Reg { Reg_r0, Reg_r1 };
Reg regs[] = { Reg_r0, Reg_r1 };
    Reg getreg() {
             static int c;
        if(c == sizeof(regs) / sizeof(int))
        c = 0;
        return regs[c++];
    }

Here's a pseudo-code (from C-language) how to implement it from what I unsertood(including the label() function)

// K is the number of registers available
int K = 2;
void gen(AST ast) {
    if(ast.left != null && ast.right != null) {
        int l = ast.left.n;
        int r = ast.right.n;

        if(l >= K && r >= K) {
            gen(ast.right);
            ast.n -= 1;
            emit_operation(PUSH, ast.right.reg);
            gen(ast.left);
            ast.reg = getreg();
            emit_operation(POP, ast.right.reg);
        } else if(l >= r) {
            gen(ast.left);
            gen(ast.right);
            ast.n -= 1;
        } else if(l < r) {
            gen(ast.right);
            gen(ast.left);
            ast.n -= 1;
        }

        ast.reg = getreg();
        Reg r1 = ast.left.reg;
        Reg r2 = ast.right.reg;
        emit_operation(ast.type, r1, r2);
    } else if(ast.type == Type_id || ast.type == Type_number) {
        ast.n += 1;
        ast.reg = getreg();
        emit_load(ast);
    } else {
        print("gen() error");
        // error
    }
}

// ershov numbers
void label(AST ast) {
    if(ast == null)
        return;

    label(ast.left);
    label(ast.right);

    if(ast.type == Type_id || ast.type == Type_number)
        ast.n = 1;
    // ast has two childrens
    else if(ast.left not null && ast.right not null) {      
        int l = ast.left.n;
        int r = ast.right.n;

        if(l == r)
            ast.n = 1 + l;
        else
            ast.n = max(1, l, r);
    }
    // ast has one child
    else if(ast.left not null && ast.right is null)
        ast.n = ast.left.n;
    else
        print("label() error!");
}

EDIT: Tell me please if more context is needed to understand this.

1条回答
戒情不戒烟
2楼-- · 2019-04-09 14:03

Sethi-Ullman is really just a scheduling algorithm, not a register allocation algorithm, so it just tells you the order in which to do operations to minimize the number of registers needed; it does not tell you which registers to use where.

So you need to combine it with a register allocation strategy -- usually just a greedy allocator. Then there's the question of what to do if you run out of registers -- do you insert spills inline, or abort and do something else?

In order to do simple greedy allocation inline with your scheduling and instruction generation (what you seem to be doing with your simple gen recursive procedure), you'll need to keep track of which registers are in use at any given time. The easiest way is by adding an extra in_use argument to your gen function:

typedef unsigned RegSet; /* using a simple bitmask for a set -- assuming that
                          * unsigned is big enough to have a bit per register */

void gen(AST *ast, RegSet in_use) {
    if(ast->left != 0 && ast->right != 0) {
        if (ast->left->n >= ast->right->n) {
            gen(ast->left, in_use);
            gen(ast->right, in_use | (1 << ast->left->reg));
        } else {
            gen(ast->right, in_use);
            gen(ast->left, in_use | (1 << ast->right->reg)); }
        ast->reg = ast->left->reg
        emit_operation(ast->type, ast->left->reg, ast->right->reg);
    } else if(ast->type == Type_id || ast->type == Type_number) {
        ast->reg = pick_unused_register(in_use);
        emit_load(ast);
    } else ....

Note that you need a separate recursive pass to calculate n for each node (Sethi-Ullman inherently requires two traversals, with the first traversal computing bottom up the n value for the second traversal to use top-down).

Now the above code doesn't deal with running out of registers at all. To do that, you need to insert some spills. One way is to detect that all registers are in use before making a recursive call and spill then, restoring afterwards:

void gen(AST *ast, RegSet in_use) {
    if(ast->left != 0 && ast->right != 0) {
        Reg spill = NoRegister; /* no spill yet */
        AST *do1st, *do2nd;     /* what order to generate children */
        if (ast->left->n >= ast->right->n) {
            do1st = ast->left;
            do2nd = ast->right;
        } else {
            do1st = ast->right;
            do2nd = ast->left; }
        gen(do1st, in_use);
        in_use |= 1 << do1st->reg;
        if (all_used(in_use)) {
            spill = pick_register_other_than(do1st->reg);
            in_use &= ~(1 << spill);
            emit_operation(PUSH, spill); }
        gen(do2nd, in_use);
        ast->reg = ast->left->reg
        emit_operation(ast->type, ast->left->reg, ast->right->reg);
        if (spill != NoRegister)
            emit_operation(POP, spill);
    } else ...

Of course, this turns out to be not terribly efficient -- its usually better to spill sooner and refill later, but only when you know you're going to run out of registers.

查看更多
登录 后发表回答