semantic type checking analysis in bison

2020-07-22 17:54发布

问题:

I've been trying to find examples everywhere but it's been in vain.

I am trying to write a basic Ruby interpreter. For this, I wrote a flex lexical file, containing token recognition sentences, and a grammar file.

I wish for my grammar to contain semantic type checking.

My grammar file contains, for example:

arg : arg '+' arg 

This should be a valid rule for integers and floats.

According to what I've read, I can specify type for a non terminal such as arg, like so:

%type <intval> arg

where "intval" is in the type union and corresponds to the int C type.

But this is only for integers, I am not sure how to make the rule valid for, say, floats. I thought about having two different rules, one for ints and one for floats, like:

argint : argint '+' argint
argfloat : argfloat '+' argfloat

but I am sure there is a much, much better way of doing so, since this atrocity would require me to have rules to allow additions between floats and ints.

All examples I've found have only one type (usually integers in calculator-like examples).

How can I achieve specifying that a rule such as an addition can have ints and floats as arguments?

Thank you very much.

回答1:

This isn't the answer you're hoping for. I think the reason that you haven't seen examples of what you want is that it's impractical to enforce typing rules in the grammar file (the .y); rather, developers accomplish this in procedural .c or .cpp code. Generally, you will have do some analysis of the parsed input anyway, so it's a byproduct to enforce the semantic rules as you do so.

As an aside, I don't quite understand how you are parsing expressions, given the fragment of your grammar that you reproduce in your question.

Here's why I claim that it's impractical. (1) Your type information has to percolate all through the non-terminals of the grammar. (2) Worse, it has to be reflected in variable names.

Consider this toy example of parsing simple assignment statements that can use identifiers, numeric constants, and the four desk calculator operators. The NUMBER token can be an integer like 42 or a float like 3.14. And let's say that an IDENTIFIER is one letter, A-Z.

%token IDENTIFIER NUMBER

%%

stmt : IDENTIFIER '=' expr
     ;

expr : expr '+' term
     | expr '-' term
     | term
     ;

term : term '*' factor
     | term '/' factor
     | factor
     ;

factor : '(' expr ')'
       | '-' factor
       | NUMBER
       | IDENTIFIER
       ;

Now let's try to introduce typing rules. We'll separate the NUMBER token into FLT_NUMBER and INT_NUMBER. Our expr, term, and factor non-terminals split into two as well:

%token IDENTIFIER FLT_NUMBER INT_NUMBER

stmt : IDENTIFIER '=' int_expr
     | IDENTIFIER '=' flt_expr
     ;

int_expr : int_expr '+' int_term
         | int_expr '-' int_term
         | int_term
         ;

flt_expr : flt_expr '+' flt_term
         | flt_expr '-' flt_term
         | flt_term
         ;

int_term : int_term '*' int_factor
         | int_term '/' int_factor
         | int_factor
         ;

flt_term : flt_term '*' flt_factor
         | flt_term '/' flt_factor
         | flt_factor
         ;

int_factor : '(' int_expr ')'
           | '-' int_factor
           | INT_NUMBER
           | int_identifier
           ;

flt_factor : '(' flt_expr ')'
           | '-' flt_factor
           | FLT_NUMBER
           | flt_identifier
           ;

int_identifier : IDENTIFIER ;

flt_identifier : IDENTIFIER ;

As our grammar stands at this point, there's a conflict: the parser can't tell whether to recognize an IDENTIFIER as a int_identifier or a flt_identifier. So it doesn't know whether to reduce A = B as IDENTIFIER = int_expr or IDENTIFIER = flt_expr.

(Here's where my understanding of Ruby is a little soft:) Ruby (like most languages) doesn't provide a way at the lexical level to determine the numeric type of an identifier. Contrast this with old school BASIC, where A denotes a number and A$ denotes a string. In other words, if you invented a language where, say, A# denotes an integer and A@ denotes a float, then you could make this work.

If you wanted to permit limited mixed-type expressions, like an int_term '*' flt_factor, then your grammar would get even more complicated.

There might be ways to work around these issues. A parser built from technology other than yacc/bison might make it easier. At the least, perhaps my sketch will give you some ideas to pursue further.