Antlr Skip text outside tag

2019-07-26 07:53发布

问题:

Im trying to skip/ignore the text outside a custom tag:

This text is a unique token to skip < ?compo \5+5\ ?> also this < ?compo \1+1\ ?>

I tried with the follow lexer:

TAG_OPEN    : '<?compo '    -> pushMode(COMPOSER);

mode COMPOSER;

TAG_CLOSE   : ' ?>'         -> popMode;

NUMBER_DIGIT    : '1'..'9';
ZERO            : '0';

LOGICOP
        : OR
        | AND
        ;

COMPAREOP
        : EQ
        | NE
        | GT
        | GE
        | LT
        | LE
        ;

    WS          : ' ';
    NEWLINE     : ('\r\n'|'\n'|'\r');
    TAB         : ('\t');

...

and parser:

instructions
        : (TAG_OPEN statement TAG_CLOSE)+?;

statement
        : if_statement
        | else
        | else_if
        | if_end
        | operation_statement
        | mnemonic
        | comment
        | transparent;

But it doesn't work (I test it by using the intelliJ tester on the rule "instructions")...

I have also add some skip rules outside the "COMPOSER" mode:

TEXT_SKIP : TAG_CLOSE .*? (TAG_OPEN | EOF)  -> skip;

But i don't have any results...

Someone can help me?

EDIT:

I change "instructions" and now the parser tree is correctly builded for every instruction of every tag:

instructions : (.*? TAG_OPEN statement TAG_CLOSE .*?)+;

But i have a not recognized character error outside the the tags...

回答1:

Below is a quick demo that worked for me.

Lexer grammar:

lexer grammar CompModeLexer;

TAG_OPEN
 : '<?compo' -> pushMode(COMPOSER)
 ;

OTHER
 : . -> skip
 ;

mode COMPOSER;

  TAG_CLOSE
   : '?>' -> popMode
   ;

  OPAR
   : '('
   ;

  CPAR
   : ')'
   ;

  INT
   : '0'
   | [1-9] [0-9]*
   ;

  LOGICOP
   : 'AND'
   | 'OR'
   ;

  COMPAREOP
   : [<>!] '='
   | [<>=]
   ;

  MULTOP
   : [*/%]
   ;

  ADDOP
   : [+-]
   ;

  SPACE
   : [ \t\r\n\f] -> skip
   ;

Parser grammar:

parser grammar CompModeParser;

options {
  tokenVocab=CompModeLexer;
}

parse
 : tag* EOF
 ;

tag
 : TAG_OPEN statement TAG_CLOSE
 ;

statement
 : expr
 ;

expr
 : '(' expr ')'
 | expr MULTOP expr
 | expr ADDOP expr
 | expr COMPAREOP expr
 | expr LOGICOP expr
 | INT
 ;

A test with the input This text is a unique token to skip <?compo 5+5 ?> also this <?compo 1+1 ?> resulted in the following tree:



回答2:

I found another solution (not elegant as the previous):

  1. Create a generic TEXT token in the general context (so outside the tag's mode)

    TEXT : ( ~[<] | '<' ~[?])+ -> skip;
    
  2. Create a parser rule for handle a generic text

    code
        : TEXT
        | (TEXT? instruction TEXT?)+;
    
  3. Create a parser rule for handle an instruction

    instruction
        : TAG_OPEN statement TAG_CLOSE;
    


标签: antlr antlr4