After creating the parse tree, i have to populate symbol table now.
I have to store information like
Type, Scope, Offset etc for the identifiers.
Now how do i know the type, scope of the identifiers , since all i know is the lexeme value and line number for that particular ID (after lexical analysis).
How do i got about the whole thing. thanks.
That's not true. You know where it is declared in the parse tree, which tells you everything you need. You do this step by processing the parse tree.
As EJP mentioned, you need to step through the parse tree. Your tree should have been created so that you can do an in-order traversal, visiting each node in the same order in which source code statements and expressions are evaluated. Your tree nodes should also correspond with a specific language construct, e.g.
WhileStmtNode
,MethodDeclNode
, etc.Suppose I am building the symbol table, recursively stepping through the tree, and I've just entered a method body node. I might have something like the following:
I keep a global variable to manage the scope. Every time I enter a block where the scope changes, I'd increment
currScope
. Similarly, I'd maintaincurrClass
andcurrMethod
variables to store with the symbol name, type, offset, etc. for later phases.Update:
Each tree node should contain all the necessary information for the entire construct. If you're using a parser generator, like CUP or Bison, you can specify how to build the tree in the grammar actions. E.g.
Those productions would match
Foo f;
and append a variable declaration node to the tree. That node encapsulates two identifier nodes that contain the line number, character number, and string value of the lexeme. The first identifier node is the type and the second is the variable name.ID
is a terminal symbol returned by the lexer upon matching an identifier. I'm assuming you are familiar with this to some extent.When you've got a syntax tree with nodes like this, you've got all the information you need.
2nd Update:
It doesn't matter whether you're using a custom parser or a generated one, there is one distinct point where you add a node into the tree upon matching a production. And it doesn't matter what language you're using. C structs will do just fine.
You must have specialized nodes in the tree, e.g. ClassNode, TypeNode, MethodDeclNode, IfStmtNode, ExprNode. You can't just store one type of node and put non-terminals and terminals in it. A non-terminal is represented as a tree node, there's no other information to store about it beside the parts that compose it, which are generally non-terminals themselves. You wouldn't store any token information. There are only but a few instances where you'd actually store a lexeme's string value: for an identifier and for a string/boolean/integer literal.
Have a look at this example. During the first reduction when
S
is reduced to(S + F)
, you append aParenExprNode
to the tree root. You also append aAddExprNode
as child of theParenExprNode
. That logic must be hard-coded into your parser when applying a reduction by rule 2 of the grammar.The tree:
The code:
In the next step, the left parenthesis is removed from the input. Afterward,
S
is on top of the stack and it is reduced toF
by rule 1. SinceF
is the non-terminal for an identifier, it's represented byIdNode
.The tree:
The code:
And so on...