We have as an assignment to make a compiler. We have already made the lexical and syntax analysis but we are stuck at generation of intermediate code. We realized that we have to implement a symbol table in order to proceed to intermediate code generation and we don't know, how to do it and what it contains.
Given the code below, what should the symbol table contain? (The code is written in an educational language which is described below)
Also how can we implement scopes in our symbol table?
<PROGRAM> ::= PROGRAM ID <BLOCK> ENDPROGRAM <BLOCK> ::= {<DECLARATIONS> <SUBPROGRAMS> <SEQUENCE>} <DECLARATIONS> ::= ε | DECLARE <VARLIST> ENDDECLARE <VARLIST> ::= ε | ID ( , ID )* <SUBPROGRAMS> ::= ( <PROCORFUNC> ) * <PROCORFUNC> ::= PROCEDURE ID <PROCORFUNCBODY> ENDPROCEDURE | FUNCTION ID <PROCORFUNCBODY> ENDFUNCTION <PROCORFUNCBODY> ::= <FORMALPARS> <BLOCK> <FORMALPARS> ::= ε | ( <FORMALPARLIST> ) <FORMALPARLIST> ::= <FORMALPARITEM> ( , <FORMALPARITEM> )* <FORMALPARITEM> ::= IN ID | INOUT ID <SEQUENCE> ::= <STATEMENT> ( ; <STATEMENT> )* <STATEMENT> ::= ε | <ASSIGNMENT-STAT> | <IF-STAT> | <WHILE-STAT> | <FOR-STAT> | <EXIT-STAT> | <CALL-STAT> | <RETURN-STAT> <ASSIGNMENT-STAT> ::= ID := <EXPRESSION> <IF-STAT> ::= IF <CONDITION> THEN <SEQUENCE> <ELSEPART> ENDIF <ELSEPART> ::= ε | ELSE <SEQUENCE> <WHILE-STAT> ::= DO {<SEQUENCE>} WHILE (<CONDITION>) <FOR-STAT> ::= (<ASSIGNMENT-STAT>; <CONDITION>;<ASSIGNMENT-STAT>;) {<SEQUENCE>} <EXIT-STAT> ::= EXIT <CALL-STAT> ::= CALL ID <ACTUALPARS> <ACTUALPARS> ::= ( <ACTUALPARLIST> ) | ε <ACTUALPARLIST> ::= <ACTUALPARITEM> ( , <ACTUALPARITEM> )* <ACTUALPARITEM> ::= IN <EXPRESSION> | INOUT ID <RETURN-STAT> ::= RETURN <EXPRESSION> <CONDITION> ::= <BOOLTERM> (OR <BOOLTERM>)* <BOOLTERM> ::= <BOOLFACTOR> (AND <BOOLFACTOR>)* <BOOLFACTOR> ::= NOT [<CONDITION>] | [<CONDITION>] | <EXPRESSION> <RELATIONAL-OPER> <EXPRESSION> | TRUE | FALSE <EXPRESSION> ::= <OPTIONAL-SIGN> <TERM> ( <ADD-OPER> <TERM>)* <TERM> ::= <FACTOR> (<MUL-OPER> <FACTOR>)* <FACTOR> ::= CONSTANT | (<EXPRESSION>) | ID <IDTAIL> <IDTAIL> ::= ε | <ACTUALPARS> <RELATIONAL-OPER> ::= = | < ( ε | = | > ) | > ( ε | = ) <ADD-OPER> ::= + | - <MUL-OPER> ::= * | / <OPTIONAL-SIGN> ::= ε | <ADD-OPER>
PROGRAM MULTIPLY
{
DECLARE
A, B, C
ENDDECLARE
PROCEDURE Aop(INOUT A)
{
A=A+1;
}
ENDPROCEDURE
FUNCTION Bop(IN B){
IF [NOT[[TRUE AND FALSE]OR[TRUE]]] THEN B := 100 / 2;
ELSE B := 100;
ENDIF;
RETURN B;
}
ENDFUNCTION
CALL Aop(INOUT A);
CALL Bop(IN B);
A := 40;
C := A * B;
}
ENDPROGRAM
A symbol table maps identifiers (typically prefixed by a scope name) to information about that identifier, such as its symbol type (local variable / parameter / function / class etc.), data type, its order relative to the other identifiers in the same scope, its source code line, etc. The symbol table can be generated by traversing the abstract syntax tree, by always keeping track of which scope you're in and adding information to the symbol table whenever you hit a variable declaration. In your example, a part of the symbol table might look like this (mapping to symbol type, data type, position, and source code line):
Now, you can resolve all variable references. For instance, in the expression
A := A + 1
, if you know that your current scope isMULTIPLY.Aop
, the symnbol table will let you find out that thisA
is an input/output parameter of typeINT
, and that it is the first parameter (this information will let you generate a stack address offset so that you can load/store the variable).