Parsing fortran-style .op. operators

I'm trying to write an ANTLR4 grammar for a fortran-inspired DSL. I'm having difficulty with the 'ole classic ".op." operators:

if (1.and.1) then

where both "1"s should be intepreted as integer. I looked at the OpenFortranParser for insight, but I can't make sense out of it.

Initially, I had suitable definitions for INTEGER and REAL in my lexer. Consequently, the first "1" above always parsed as a REAL, no matter what I tried. I tried moving things into the parser, and got it to the point where I could reliably recognize the ".and." along with numbers around it as appropriately INTEGER or REAL.

if (1.and.1)   # INT/INT
if (1..and..1) # REAL/REAL

...etc...

I of course want to recognize variable-names in such statements:

if (a.and.b)

and have an appropriate rule for ID. In the small grammar below, however, any literals in quotes (ex, 'and', 'if', all the single-character numerical suffixes) are not accepted as an ID, and I get an error; any other ID-conforming string is accepted:

if (a.and.b)  # errs, as 'b' is valid INTEGER suffix
if (a.and.c)  # OK

Any insights into this behavior, or better suggestions on how to parse the .op. operators in fortran would be greatly appreciated -- Thanks!

grammar Foo;

start  : ('if' expr | ID)+ ;

DOT : '.' ;

DIGITS: [0-9]+;

ID : [a-zA-Z0-9][a-zA-Z0-9_]* ;

andOp : DOT 'and' DOT ;

SIGN : [+-];

expr     
    : ID
    | expr andOp expr
    | numeric
    | '(' expr ')'
    ;

integer : DIGITS ('q'|'Q'|'l'|'L'|'h'|'H'|'b'|'B'|'i'|'I')? ;

real    
    : DIGITS DOT DIGITS? (('e'|'E') SIGN? DIGITS)? ('d' | 'D')?
    |        DOT DIGITS  (('e'|'E') SIGN? DIGITS)? ('d' | 'D')?
    ;

numeric : integer | real;

EOLN  : '\r'? '\n' -> skip;

WS    :  [ \t]+ -> skip;

To disambiguate DOT, add a lexer rule with a predicate just before the DOT rule.

DIT : DOT { isDIT() }? ;
DOT : '.' ;

Change the 'andOp'

andOp : DIT 'and' DIT ;

Then add a predicate method

@lexer::members {

public boolean isDIT() {
    int offset = _tokenStartCharIndex;
    String r = _input.getText(Interval.of(offset-4, offset));
    String s = _input.getText(Interval.of(offset, offset+4));
    if (".and.".equals(s) || ".and.".equals(r)) {
        return true;
    }
    return false;
}

}

But, that is not really the source of your current problem. The integer parser rule defines lexer constants effectively outside of the lexer, which is why 'b' is not matched to an ID.

Change it to

integer : INT ;

INT:  DIGITS ('q'|'Q'|'l'|'L'|'h'|'H'|'b'|'B'|'i'|'I')? ;

and the lexer will figure out the rest.

Parsing fortran-style .op. operators

问题:

回答1:

收藏的人(0)

Parsing fortran-style .op. operators

问题:

回答1:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮