-->

是什么抽象语法树和具体语法树之间的区别?是什么抽象语法树和具体语法树之间的区别?(What is t

2019-05-08 19:50发布

我一直在阅读一些关于翻译/编译器是如何工作的,并在那里我感到困惑的一个领域是AST和CST之间的区别。 我的理解是,解析器使得CST,将它交给语义分析仪,它把它变成一个AST。 然而,我的理解是,语义分析只是确保规则得到遵守。 我真的不明白为什么它实际上做任何修改,使其抽象,而不是具体的。

有没有办法,我错过关于语义分析的东西,或者是AST和CST之间的区别有点人为?

Answer 1:

一个具体的语法树代表解析的形式完全相同的源文本。 在一般情况下,它符合上下文无关文法定义的源语言。

但是,具体的语法和树有很多事情是需要做的源文本明确解析的,但无助于实际意义。 例如,为了实现运算符优先级,你的CFG通常有表达组件(长期,因子等)的几个层次,与运营商在不同层面连接它们(添加方面得到表达,术语组成的可选multipled因素等)。 要真正解释或编译的语言,但是,你不需要这个; 你只需要有运营商和操作数表达式节点。 抽象语法树是简化了具体语法树下来这个东西实际需要来代表程序的意义的结果。 这棵树有一个更简单的定义,因此更容易执行的后期处理。

你通常并不需要实际建立一个具体的语法树。 在YACC动作程序(或ANTLR的,或巨石,或任何...)语法可以直接建立抽象语法树,所以具体语法树只存在为代表的源文本的解析结构的概念实体。



Answer 2:

A concrete syntax tree matches what the grammar rules say is the syntax. The purpose of the abstract syntax tree is have a "simple" representation of what's essential in "the syntax tree".

A real value in the AST IMHO is that it is smaller than the CST, and therefore takes less time to process. (You might say, who cares? But I work with a tool where we have tens of millions of nodes live at once!).

Most parser generators that have any support for building syntax trees insist that you personally specify exactly how they get built under the assumption that your tree nodes will be "simpler" than the CST (and in that, they are generally right, as programmers are pretty lazy). Arguably it means you have to code fewer tree visitor functions, and that's valuable, too, in that it minimizes engineering energy. When you have 3500 rules (e.g., for COBOL) this matters. And this "simpler"ness leads to the good property of "smallness".

But having such ASTs creates a problem that wasn't there: it doesn't match the grammar, and now you have to mentally track both of them. And when there are 1500 AST nodes for a 3500 rule grammar, this matters a lot. And if the grammar evolves (they always do!), now you have two giant sets of things to keep in synch.

Another solution is to let the parser simply build CST nodes for you and just use those. This is a huge advantage when building the grammars: there's no need to invent 1500 special AST nodes to model 3500 grammar rules. Just think about the tree being isomorphic to the grammar. From the point of view of the grammar engineer this is completely brainless, which lets him focus on getting the grammar right and hacking at it to his heart's content. Arguably you have to write more node visitor rules, but that can be managed. More on this later.

What we do with the DMS Software Reengineering Toolkit is to automatically build a CST based on the results of a (GLR) parsing process. DMS then automatically constructs an "compressed" CST for space efficiency reasons, by eliminating non-value carrying terminals (keywords, punctation), semantically useless unary productions, and forming lists for grammar rule pairs that are list like:

    L = e ;
    L = L e ;
    L2 = e2 ;
    L2 = L2  ','  e2 ;

and a wide variety of variations of such forms. You think in terms of the grammar rules and the virtual CST; the tool operates on the compressed representation. Easy on your brain, faster/smaller at runtime.

Remarkably, the compressed CST built this way looks a lot an AST that you might have designed by hand (see link at end to examples). In particular, the compressed CST doesn't carry any nodes that are just concrete syntax. There are minor bits of awkwardness: for example while the concrete nodes for '(' and ')' classically found in expression subgrammars are not in the tree, a "parentheses node" does appear in the compressed CST and has to be handled. A true AST would not have this. This seems like a pretty small price to pay for the convenience of not have to specify the AST construction, ever. And the documentation for the tree is always available and correct: the grammar is the documentation.

How do we avoid "extra visitors"? We don't entirely, but DMS provides an AST library that walks the AST and handles the differences between the CST and the AST transparently. DMS also offers an "attribute grammar" evaluator (AGE), which is a method for passing values computed a nodes up and down the tree; the AGE handles all the tree representation issues and so the tool engineer only worries about writing computations effectively directly on the grammar rules themselves. Finally, DMS also provides "surface-syntax" patterns, which allows code fragments from the grammar to used to find specific types of subtrees, without knowing most of the node types involved.

One of the other answers observes that if you want to build tools that can regenerate source, your AST will have to match the CST. That's not really right, but it is far easier to regenerate the source if you have CST nodes. DMS generates most of the prettyprinter automatically because it has access to both :-}

Bottom line: ASTs are good for small, both phyiscal and conceptual. Automated AST construction from the CST provides both, and lets you avoid the problem of tracking two different sets.

EDIT March 2015: Link to examples of CST vs. "AST" built this way



Answer 3:

本博客文章可能会有所帮助。

在我看来,该AST“扔掉”了很多,不会有助于语义语法中间体/结构信息。 例如,你不关心3是一个原子是一个术语是一个因素是....你只关心它的3 ,当你实现幂表达或什么的。



Answer 4:

这是基于表达式求值泰伦斯帕尔语法。

这个例子的语法:

grammar Expr002;

options 
{
    output=AST;
    ASTLabelType=CommonTree; // type of $stat.tree ref etc...
}

prog    :   ( stat )+ ;

stat    :   expr NEWLINE        -> expr
        |   ID '=' expr NEWLINE -> ^('=' ID expr)
        |   NEWLINE             ->
        ;

expr    :   multExpr (( '+'^ | '-'^ ) multExpr)*
        ; 

multExpr
        :   atom ('*'^ atom)*
        ; 

atom    :   INT 
        |   ID
        |   '('! expr ')'!
        ;

ID      : ('a'..'z' | 'A'..'Z' )+ ;
INT     : '0'..'9'+ ;
NEWLINE : '\r'? '\n' ;
WS      : ( ' ' | '\t' )+ { skip(); } ;

输入

x=1
y=2
3*(x+y)

解析树

分析树是输入的一个具体表现。 解析树保留所有的输入信息。 空框表示线的空白,即,端部。

AST

AST是输入的抽象表示。 请注意,括号是不存在的,因为AST的关联是从树结构中导出。

编辑

对于更多的是通过解释见编译器和编译器发电机皮克。 23



Answer 5:

具体语法树如下语言的语法规则。 在语法,“表达式列表”通常与两个规则定义

  • expression_list可以是:表达
  • expression_list可以是:表达,expression_list

随后从字面上看,这两个规则给出了一个梳形状出现在节目的任何表达式列表。

抽象语法树是这便于进一步的操作形式。 它代表的方式,是有道理的,有人理解程序的意义,而不仅仅是它们写入事物的方式。 上面的表达式列表,其可以是的函数的参数列表中,可以方便地表示为表达式向量,因为它是为静态分析最好是有明确可用的表达的总数量,并能够通过访问每个表达其指数。



Answer 6:

具体语法树包含像多余的括号和空格和注释的所有信息,抽象语法树抽象从这个信息了。

 

注: 够有趣,当你实现一个重构引擎您的AST将再次包含所有的具体信息,但你会继续将其称为一个AST,因为这已经成为该领域的标准术语(所以可以说有很长前失去了原有的意义)。



Answer 7:

这是有区别的不有所作为。

一个AST通常是作为一种由扔掉词汇内容来近似一种编程语言表达的语义解释。 例如,在上下文无关文法可以编写以下EBNF规则

term: atom (('*' | '/') term )*

而在AST情况下,你只能使用和表达正确的算术运算。

不能将那些规则摆在首位的语法介绍? 当然。 你可以把它分成用来模仿提到AST节点更具体的规则,把上面的紧凑和抽象的规则:

term: mul_rule | div_rule
mul_rule: atom ('*' term)*
div_rule: atom ('/' term)*

现在,当你认为在自上而下的分析而言,则第二介绍和东西的LL(1)语法分析器无法处理之间的第一/前冲突。 第一条规则的形式是,其有效地消除结构中的第二个的左侧因式分解型式。 你必须支付一些奖金使用LL(1)在这里。

所以,AST的是用来固定的语法和解析器的不足之处特设的补充。 该CST - > AST转型是一个重构的举动。 当一个额外的逗号或冒号存储在语法树没有人打扰。 相反一些学者改造他们到AST的,因为他们喜欢用的AST在同一时间做的重构,而不是维持树木或写一个额外的推理引擎。 程序员是懒惰了很好的理由。 其实他们即使存储线和词法分析AST的聚集错误报告列信息。 很抽象确实如此。



Answer 8:

简单地说,AST只包含代码的语义分析树/ CST还包括了代码究竟是怎样的书面资料。



Answer 9:

CST(具体语法树)是语法树表示(方案应该怎么写规则)。 取决于编译器架构,它可以由分析器用来产生AST。

AST(抽象语法树)被解析源的树表示,由编译器的分析器部分产生。 它存储了有关令牌+语法信息。

根据您的编译器架构,CST可用于产生AST。 它是公平地说,CST演变成AST。 或者,AST是更丰富的CST。

更多的说明可以此链接上找到: http://eli.thegreenplace.net/2009/02/16/abstract-vs-concrete-syntax-trees#id6



文章来源: What is the difference between an Abstract Syntax Tree and a Concrete Syntax Tree?