ANTLR: call a rule from a different grammar

2019-03-16 07:06发布

is it possible to invoke a rule from a different grammar?
the purpose is to have two languages in the same file, the second language starting by an (begin ...) where ... is in the second language. the grammar should invoke another grammar to parse that second language.

for example:


grammar A;

start_rule
    :    '(' 'begin' B.program ')' //or something like that
    ;


grammar B;

program
    :   something* EOF
    ;

something
    : ...
    ;

1条回答
ゆ 、 Hurt°
2楼-- · 2019-03-16 07:48

Your question could be interpreted in (at least) two ways:

  1. separate rules from a large grammar into separate grammars;
  2. parse a separate language inside your "main" language (island grammar).

I assume it's the first, in which case you can import grammars.

A demo for option 1:

file: L.g

lexer grammar L;

Digit
  :  '0'..'9'
  ;

file: Sub.g

parser grammar Sub;

number
  :  Digit+
  ;

file: Root.g

grammar Root;

import Sub;

parse
  :  number EOF {System.out.println("Parsed: " + $number.text);}
  ;

file: Main.java

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    L lexer = new L(new ANTLRStringStream("42"));
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    RootParser parser = new RootParser(tokens);
    parser.parse();
  }
}

Run the demo:

bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp antlr-3.3.jar org.antlr.Tool L.g
bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp antlr-3.3.jar org.antlr.Tool Root.g 
bart@hades:~/Programming/ANTLR/Demos/Composite$ javac -cp antlr-3.3.jar *.java
bart@hades:~/Programming/ANTLR/Demos/Composite$ java -cp .:antlr-3.3.jar Main

which will print:

Parsed: 42

to the console.

More info, see: http://www.antlr.org/wiki/display/ANTLR3/Composite+Grammars

A demo for option 2:

A nice example of a language inside a language is regex. You have the "normal" regex language with its meta characters, but there's another one in it: the language that describes a character set (or character class).

Instead of accounting for the meta characters of a character set (range -, negation ^, etc.) inside your regex-grammar, you could simply consider a character set as a single token consisting of a [ and then everything up to and including ] (with possibly \] in it!) inside your regex-grammar. When you then stumble upon a CharSet token in one of your parser rules, you invoke the CharSet-parser.

file: Regex.g

grammar Regex;

options { 
  output=AST;
}

tokens {
  REGEX;
  ATOM;
  CHARSET;
  INT;
  GROUP;
  CONTENTS;
}

@members {
  public static CommonTree ast(String source) throws RecognitionException {
    RegexLexer lexer = new RegexLexer(new ANTLRStringStream(source));
    RegexParser parser = new RegexParser(new CommonTokenStream(lexer));
    return (CommonTree)parser.parse().getTree();
  }
}

parse
  :  atom+ EOF -> ^(REGEX atom+)
  ;

atom
  :  group quantifier?     -> ^(ATOM group quantifier?)
  |  EscapeSeq quantifier? -> ^(ATOM EscapeSeq quantifier?)
  |  Other quantifier?     -> ^(ATOM Other quantifier?)
  |  CharSet quantifier?   -> ^(CHARSET {CharSetParser.ast($CharSet.text)} quantifier?)
  ;

group
  :  '(' atom+ ')' -> ^(GROUP atom+)
  ;

quantifier
  :  '+'
  |  '*'
  ;

CharSet
  :  '[' (('\\' .) | ~('\\' | ']'))+ ']'
  ;

EscapeSeq
  :  '\\' .
  ;

Other
  :  ~('\\' | '(' | ')' | '[' | ']' | '+' | '*')
  ;

file: CharSet.g

grammar CharSet;

options { 
  output=AST;
}

tokens {
  NORMAL_CHAR_SET;
  NEGATED_CHAR_SET;
  RANGE;
}

@members {
  public static CommonTree ast(String source) throws RecognitionException {
    CharSetLexer lexer = new CharSetLexer(new ANTLRStringStream(source));
    CharSetParser parser = new CharSetParser(new CommonTokenStream(lexer));
    return (CommonTree)parser.parse().getTree();
  }
}

parse
  :  OSqBr ( normal  -> ^(NORMAL_CHAR_SET normal)
           | negated -> ^(NEGATED_CHAR_SET negated)
           ) 
     CSqBr
  ;

normal
  :  (EscapeSeq | Hyphen | Other) atom* Hyphen?
  ;

negated
  :  Caret normal -> normal
  ;

atom
  :  EscapeSeq
  |  Caret
  |  Other
  |  range
  ;

range
  :  from=Other Hyphen to=Other -> ^(RANGE $from $to)
  ;

OSqBr
      :  '['
  ;

CSqBr
  :  ']'
  ;

EscapeSeq
  :  '\\' .
  ;

Caret
  :  '^'
  ;

Hyphen
  :  '-'
  ;

Other
  :  ~('-' | '\\' | '[' | ']')
  ;

file: Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    CommonTree tree = RegexParser.ast("((xyz)*[^\\da-f])foo");
    DOTTreeGenerator gen = new DOTTreeGenerator();
    StringTemplate st = gen.toDOT(tree);
    System.out.println(st);
  }
}

And if you run the main class, you will see the DOT output for the regex ((xyz)*[^\\da-f])foo which is the following tree:

enter image description here

The magic is inside the Regex.g grammar in the atom rule where I inserted a tree node in a rewrite rule by invoking the static ast method from the CharSetParser class:

CharSet ... -> ^(... {CharSetParser.ast($CharSet.text)} ...)

Note that inside such rewrite rules, there must not be a semi colon! So, this would be wrong: {CharSetParser.ast($CharSet.text);}.

EDIT

And here's how to create tree walkers for both grammars:

file: RegexWalker.g

tree grammar RegexWalker;

options {
  tokenVocab=Regex;
  ASTLabelType=CommonTree;
}

walk
  :  ^(REGEX atom+) {System.out.println("REGEX: " + $start.toStringTree());}
  ;

atom
  :  ^(ATOM group quantifier?)
  |  ^(ATOM EscapeSeq quantifier?)
  |  ^(ATOM Other quantifier?)
  |  ^(CHARSET t=. quantifier?) {CharSetWalker.walk($t);}
  ;

group
  :  ^(GROUP atom+)
  ;

quantifier
  :  '+'
  |  '*'
  ;

file: CharSetWalker.g

tree grammar CharSetWalker;

options {
  tokenVocab=CharSet;
  ASTLabelType=CommonTree;
}

@members {
  public static void walk(CommonTree tree) {
    try {
      CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
      CharSetWalker walker = new CharSetWalker(nodes);
      walker.walk();
    } catch(Exception e) {
      e.printStackTrace();
    }
  }
}

walk
  :  ^(NORMAL_CHAR_SET normal)  {System.out.println("NORMAL_CHAR_SET: " + $start.toStringTree());}
  |  ^(NEGATED_CHAR_SET normal) {System.out.println("NEGATED_CHAR_SET: " + $start.toStringTree());}
  ;

normal
  :  (EscapeSeq | Hyphen | Other) atom* Hyphen?
  ;

atom
  :  EscapeSeq
  |  Caret
  |  Other
  |  range
  ;

range
  :  ^(RANGE Other Other)
  ;

Main.java

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;

public class Main {
  public static void main(String[] args) throws Exception {
    CommonTree tree = RegexParser.ast("((xyz)*[^\\da-f])foo");
    CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
    RegexWalker walker = new RegexWalker(nodes);
    walker.walk();
  }
}

To run the demo, do:

java -cp antlr-3.3.jar org.antlr.Tool CharSet.g 
java -cp antlr-3.3.jar org.antlr.Tool Regex.g
java -cp antlr-3.3.jar org.antlr.Tool CharSetWalker.g
java -cp antlr-3.3.jar org.antlr.Tool RegexWalker.g 
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main

which will print:

NEGATED_CHAR_SET: (NEGATED_CHAR_SET \d (RANGE a f))
REGEX: (REGEX (ATOM (GROUP (ATOM (GROUP (ATOM x) (ATOM y) (ATOM z)) *) (CHARSET (NEGATED_CHAR_SET \d (RANGE a f))))) (ATOM f) (ATOM o) (ATOM o))
查看更多
登录 后发表回答