The terms AST (Abstract Syntax Tree), parse tree and derivation tree are bandied about by different people when referring to the result of parsing texts conforming to a grammar. Assuming we are talking about parsing computer languages, are their differences minute enough that we can use these terms interchangeably ? If not, how do we use the terms correctly ?
相关问题
- Correctly parse PDF paragraphs with Python
- R: eval(parse()) error message: cannot ope
- How do I parse a .pls file using PHP? Having troub
-
Create array from the contents of 相关文章
- How do I get from a type to the TryParse method?
- Slow ANTLR4 generated Parser in Python, but fast i
- Parsing JSON in QML [duplicate]
- How do I generate an AST from a string of C++ usin
- Why does the C++ compiler give errors after lines
- JSoup will not fetch all items?
- Content is not allowed in prolog
- How to manage parsing an null object for DateTime
AFAIK, "derivation tree" and "parse tree" are the same.
Abstract syntax tree
Parse tree
Take the source
a = (1 + 2) * 3;
for example. The parse tree could look like:while the abstract syntax tree could look like:
Parse/Derivation/Concrete syntax trees are all synonyms for the same concept.
Such trees are normally only used in theory discussions, because they contains lots of details that seem unnecessary for doing langauge processing; in an expression tree, do you really need a node to represent "(" and another to represent ")"?
The notion of an "abstract syntax" tree is one which represents the program structure to a level of detail that is adequate for processing in practice; you don't typically find nodes for "(...)".
An interesting question: is an AST directly computable from a CST? The answer is should be yes, but people hardly ever do it. What they tend to do is construct "abstract syntax" nodes as the parser runs, and use ad hoc (rule reduction procedural attachment) to assemble the nodes from child parses with a glue node for the parent. IMHO, they do this because we've all been brought up on YACC, and that's how it is traditionally done. (We used to light fires with flint, too.) There's a lesser excuse; doing it this way gives the compiler-builder complete control of the structure of the AST and he can produce one that is pretty minimal in terms of extra detail. Such an ad-hoc tree is not computable from the CST, except by the same ad-hoc computation that is embedded in the parser actions.
I've used a different approach: my tools compute AST directly from CSTs, literally by dropping irrelevant details, e.g., leaving out nodes that represent non-value bearing tokens (e.g., those pointless '(' ')' tokens as well as keywords), compressing out strings of unary productions, and converting right- or left-leaning trees equivalent to lists, into real list nodes. The advantage to doing this is the parser can compute the AST directly from the grammar rules. No fiddling around with procedural attachments. No getting it wrong. No more worrying about the fact that our COBOL grammar has 3500 rules and I would otherwise need procedural goo for every one of them, and that I have to change my grammar hundreds of times to get it right and fiddle with the goo every time. And our tools work as though they operate directly on the CST, which makes it easy to think about tree manipulations, especially if you are staring directly at the grammar rules. (This also makes pattern matching using surface syntax much easier: for any pattern fragment, there's a directly computable AST that corresponds).
So the distinction between AST and CST is real in terms of utility. But I think they should be considered as just isomorphic representations.
I would use the term parse tree when the tree is produced by parsing, that is when evaluating if a given input sequence belongs to the language and determining which productions must be used in which order to regenerate the sequence.
A derivation tree would have exactly the same shape, but would be produced by the process of deriving a sequence from a given production.
The formal definition of parsing is finding a derivation for the given input sequence, so no wonder derivation and parse trees are the same.
Concrete versus abstract syntax trees differ in that the former has a leaf node for each token in the input sequence, while the latter omits any tokens that can be known by inspecting the grammar. A concrete syntax subtree for
if <expr> then <statement> else <statement> end
would have leafs for if, then, else, and end, and the abstract one would not. The concrete syntax tree for(2+3)
would be:The abstract one would be just: