ANTLR parse assignments

2019-08-09 16:22发布

问题:

I want to parse some assignments, where I only care about the assignment as a whole. Not about whats inside the assignment. An assignment is indiciated by ':='. (EDIT: Before and after the assignments other things may come)

Some examples:

a := TRUE & FALSE;
c := a ? 3 : 5;
b := case 
          a : 1;
          !a : 0;
        esac;

Currently I make a difference between assignments containing a 'case' and other assignments. For simple assignments I tried something like ~('case' | 'esac' | ';') but then antlr complained about unmatched tokens (like '=').

assignment : 
   NAME ':='! expression ;

expression : 
    ( simple_expression | case_expression) ;


simple_expression : 
    ((OPERATOR | NAME) & ~('case' | 'esac'))+  ';'! ;

case_expression : 
    'case' .+ 'esac' ';'! ;

I tried replacing with the following, because the eclipse-interpreter did not seem to like the ((OPERATOR | NAME) & ~('case' | 'esac'))+ ';'! ; because of the 'and'.

   (~(OPERATOR | ~NAME | ('case' | 'esac')) |
    ~(~OPERATOR | NAME | ('case' | 'esac')) |
    ~(~OPERATOR | ~NAME | ('case' | 'esac')))  ';'!

But this does not work. I get

"error(139): /AntlrTutorial/src/foo/NusmvInput.g:78:5: set complement is empty |---> ~(~OPERATOR | ~NAME | ('case' | 'esac'))) EOC! ;"

How can I parse it?

回答1:

There are a couple of things going wrong here:

  • you're using & in your grammar while it should be with quotes around it: '&'
  • unless you know exactly what you're doing, don't use ~ and . (especially not .+ !) inside parser rules: use them in lexer rules only;
  • create lexer rules instead of defining 'case' and 'esac' in your parser rules (it's safe to use literal tokens in your parser rules if no other lexer rule can potentially match is, but 'case' and 'esac' look a lot like NAME and they could end up in your AST in which case it's better to explicitly define them yourself in the lexer)

Here's a quick demo:

grammar T;

options {
  output=AST;
}

tokens {
  ROOT;
  CASES;
  CASE;
}

parse
 : (assignment SCOL)* EOF -> ^(ROOT assignment*)
 ;

assignment 
 : NAME ASSIGN^ expression 
 ;

expression
 : ternary_expression
 ;

ternary_expression
 : or_expression (QMARK^ ternary_expression COL! ternary_expression)?
 ;

or_expression
 : unary_expression ((AND | OR)^ unary_expression)*
 ;

unary_expression
 : NOT^ atom
 | atom
 ;

atom
 : TRUE
 | FALSE
 | NUMBER
 | NAME
 | CASE single_case+ ESAC -> ^(CASES single_case+)
 | '(' expression ')'     -> expression
 ;

single_case
 : expression COL expression SCOL -> ^(CASE expression expression)
 ;

TRUE   : 'TRUE';
FALSE  : 'FALSE';
CASE   : 'case';
ESAC   : 'esac';
ASSIGN : ':='; 
AND    : '&';
OR     : '|';
NOT    : '!';
QMARK  : '?';
COL    : ':';
SCOL   : ';';
NAME   : ('a'..'z' | 'A'..'Z')+;
NUMBER : ('0'..'9')+;
SPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();};

which will parse your input:

a := TRUE & FALSE;
c := a ? 3 : 5;
b := case 
          a : 1;
          !a : 0;
        esac;

as follows: