Clang : What does AST (abstract syntax tree) look

2019-03-16 00:28发布

问题:

Hi I am new in Compiler development, and am wondering how AST look like. I have a small section of code, and I use Clang for generating the AST. I don't get much information out of it. From the looks of it, the Syntax tree is exactly the same as the source, except for one struct that is added to almost any sample I test with.

Source:

class A {
public:
  int *a, *b, *c;
  int i;
  void sum() {
    a = new int[5];
    b = new int[5];
    c = new int[5];
    for (i = 0; i < 5; i++) {
      a[i] = i;
      b[i] = i;
    }
    for (i = 0; i < 5; i++) {
      c[i] = a[i] + b[i];
    }
    delete[] a;   delete[] b;   delete[] c;
  }
};

class B : public A {
};

int main() {
  B bclass; 
  bclass.sum();
  return 0;
} 

Command to generate AST:

clang++ -cc1 -ast-print ~/sum.cpp

AST output:

struct __va_list_tag {
    unsigned int gp_offset;
    unsigned int fp_offset;
    void *overflow_arg_area;
    void *reg_save_area;
};
typedef struct __va_list_tag __va_list_tag;
class A {
public:
    int *a;
    int *b;
    int *c;
    int i;
    void sum()     {
        this->a = new int [5];
        this->b = new int [5];
        this->c = new int [5];
        for (this->i = 0; this->i < 5; this->i++) {
            this->a[this->i] = this->i;
            this->b[this->i] = this->i;
        }
        for (this->i = 0; this->i < 5; this->i++) {
            this->c[this->i] = this->a[this->i] + this->b[this->i];
        }
        delete [] this->a;
        delete [] this->b;
        delete [] this->c;
    }


};
class B : public A {
};
int main() {
    B bclass;
    bclass.sum();
    return 0;
}

Thanks

回答1:

There is a small confusion between the various options available:

  • -ast-print will pretty-print the current AST, that is, it will render the code it understood as closely as possible to what it parsed (but making some things explicit, like the apparition of the this)
  • -ast-dump will generate a lisp-like representation of the current AST

The pretty printer can be useful to check that the AST is lossless (ie, preserved the const-ness of such expression, etc...) but is not really about development.

If you want to hack on the compiler, you need -ast-dump, which will generate an output that maps directly the in-memory representation of the code that was parsed.



回答2:

The AST is a linked structure in memory ("tree" does not make justice to the complexity of the thing, but it's the name people use). What -ast-print produces is a textual representation of the AST. Since the human who set the option is already familiar with C/C++-like syntax, it is printed in a representation that follows that syntax. This is a design choice, not a happy coincidence.

If you want to see what the AST looks like when it's not printed on purpose in a familiar syntax, you could for instance look at GIMPLE, GCC's internal representation.



回答3:

And if you want to play with GIMPLE, you could even use GCC MELT for that purpose. MELT is a high-level domain specific language to deal with GIMPLE!

And inside compilers, the internal representation are often not trees, but somehow circular structures. In GCC, a basic block knows it gimple-s, but the gimple-s may know their basic blocks.... (it is a bit more complex, but you've got the idea).