ANTLR doesn't give correct output tokens for S

2019-04-28 16:59发布

问题:

I am new to Scala and I am trying to parse Scala files with the use of Scala Grammar and ANTLR. Below is the code for Scala Grammar which I got from the git hub link:

https://github.com/antlr/grammars-v4/tree/master/scala

There are chances of repo to be moved so I am pasting the Scala grammar code here:

grammar Scala;

literal           : '-'? IntegerLiteral
                | '-'? FloatingPointLiteral
                | BooleanLiteral
                | CharacterLiteral
                | StringLiteral
                | SymbolLiteral
                | 'null' ;

qualId            : Id ('.' Id)* ;

ids               : Id (',' Id)* ;

stableId          : (Id | (Id '.')? 'this') '.' Id
                | (Id '.')? 'super' classQualifier? '.' Id ;

classQualifier    : '[' Id ']' ;

type              : functionArgTypes '=>' type
                | infixType existentialClause? ;

functionArgTypes  : infixType
                | '(' ( paramType (',' paramType )* )? ')' ;

existentialClause : 'forSome' '{' existentialDcl (Semi existentialDcl)* '}';

existentialDcl    : 'type' typeDcl
                | 'val' valDcl;

infixType         : compoundType (Id Nl? compoundType)*;

compoundType      : annotType ('with' annotType)* refinement?
                | refinement;

annotType         : simpleType annotation*;

simpleType        : simpleType typeArgs
                | simpleType '#' Id
                | stableId
                | (stableId | (Id '.')? 'this') '.' 'type'
                | '(' types ')';

typeArgs          : '[' types ']';

types             : type (',' type)*;

refinement        : Nl? '{' refineStat (Semi refineStat)* '}';

refineStat        : dcl
                | 'type' typeDef
                | ;

typePat           : type;

ascription        : ':' infixType
                | ':' annotation+
                | ':' '_' '*';

expr              : (bindings | 'implicit'? Id | '_') '=>' expr
                | expr1 ;

expr1             : 'if' '(' expr ')' Nl* expr (Semi? 'else' expr)?
                | 'while' '(' expr ')' Nl* expr
                | 'try' ('{' block '}' | expr) ('catch' '{' caseClauses '}')? ('finally' expr)?
                | 'do' expr Semi? 'while' '(' expr ')'
                | 'for' ('(' enumerators ')' | '{' enumerators '}') Nl* 'yield'? expr
                | 'throw' expr
                | 'return' expr?
                | (('new' (classTemplate | templateBody)| blockExpr | simpleExpr1 '_'?) '.') Id '=' expr
                | simpleExpr1 argumentExprs '=' expr
                | postfixExpr
                | postfixExpr ascription
                | postfixExpr 'match' '{' caseClauses '}' ;

postfixExpr       : infixExpr (Id Nl?)? ;

infixExpr         : prefixExpr
                | infixExpr Id Nl? infixExpr ;

prefixExpr        : ('-' | '+' | '~' | '!')?
                  ('new' (classTemplate | templateBody)| blockExpr | simpleExpr1 '_'?) ;

simpleExpr1       : literal
                | stableId
                | (Id '.')? 'this'
                | '_'
                | '(' exprs? ')'
                | ('new' (classTemplate | templateBody) | blockExpr ) '.' Id
                | ('new' (classTemplate | templateBody) | blockExpr ) typeArgs
                | simpleExpr1 argumentExprs
      ;

exprs             : expr (',' expr)* ;

argumentExprs     : '(' exprs? ')'
                | '(' (exprs ',')? postfixExpr ':' '_' '*' ')'
                | Nl? blockExpr ;

blockExpr         : '{' caseClauses '}'
                | '{' block '}' ;
block             : blockStat (Semi blockStat)* resultExpr? ;

blockStat         : import_
                | annotation* ('implicit' | 'lazy')? def
                | annotation* localModifier* tmplDef
                | expr1
                | ;

resultExpr        : expr1
                | (bindings | ('implicit'? Id | '_') ':' compoundType) '=>' block ;

enumerators       : generator (Semi generator)* ;

generator         : pattern1 '<-' expr (Semi? guard | Semi pattern1 '=' expr)* ;

caseClauses       : caseClause+ ;

caseClause        : 'case' pattern guard? '=>' block ;

guard             : 'if' postfixExpr ;

pattern           : pattern1 ('|' pattern1 )* ;

pattern1          : Varid ':' typePat
                | '_' ':' typePat
                | pattern2 ;

pattern2          : Varid ('@' pattern3)?
                | pattern3 ;

pattern3          : simplePattern
                | simplePattern (Id Nl? simplePattern)* ;

simplePattern     : '_'
                | Varid
                | literal
                | stableId ('(' patterns ')')?
                | stableId '(' (patterns ',')? (Varid '@')? '_' '*' ')'
                | '(' patterns? ')' ;

patterns          : pattern (',' patterns)*
                | '_' * ;

typeParamClause   : '[' variantTypeParam (',' variantTypeParam)* ']' ;

funTypeParamClause: '[' typeParam (',' typeParam)* ']' ;

variantTypeParam  : annotation? ('+' | '-')? typeParam ;

typeParam         : (Id | '_') typeParamClause? ('>:' type)? ('<:' type)?
                  ('<%' type)* (':' type)* ;

paramClauses      : paramClause* (Nl? '(' 'implicit' params ')')? ;

paramClause       : Nl? '(' params? ')' ;

params            : param (',' param)* ;

param             : annotation* Id (':' paramType)? ('=' expr)? ;

paramType         : type
                | '=>' type
                | type '*';

classParamClauses : classParamClause*
                  (Nl? '(' 'implicit' classParams ')')? ;

classParamClause  : Nl? '(' classParams? ')' ;

classParams       : classParam (',' classParam)* ;

classParam        : annotation* modifier* ('val' | 'var')?
                  Id ':' paramType ('=' expr)? ;

bindings          : '(' binding (',' binding )* ')' ;

binding           : (Id | '_') (':' type)? ;

modifier          : localModifier
                | accessModifier
                | 'override' ;

localModifier     : 'abstract'
                | 'final'
                | 'sealed'
                | 'implicit'
                | 'lazy' ;

accessModifier    : ('private' | 'protected') accessQualifier? ;

accessQualifier   : '[' (Id | 'this') ']' ;

annotation        : '@' simpleType argumentExprs* ;

constrAnnotation  : '@' simpleType argumentExprs ;

templateBody      : Nl? '{' selfType? templateStat (Semi templateStat)* '}' ;

templateStat      : import_
                | (annotation Nl?)* modifier* def
                | (annotation Nl?)* modifier* dcl
                |  expr
                | ;

selfType          : Id (':' type)? '=>'
                | 'this' ':' type '=>' ;

import_           : 'import' importExpr (',' importExpr)* ;

importExpr        : stableId '.' (Id | '_' | importSelectors) ;

importSelectors   : '{' (importSelector ',')* (importSelector | '_') '}' ;

importSelector    : Id ('=>' Id | '=>' '_') ;

dcl               : 'val' valDcl
                | 'var' varDcl
                | 'def' funDcl
                | 'type' Nl* typeDcl ;

valDcl            : ids ':' type ;

varDcl            : ids ':' type ;

funDcl            : funSig (':' type)? ;

funSig            : Id funTypeParamClause? paramClauses ;

typeDcl           : Id typeParamClause? ('>:' type)? ('<:' type)? ;

patVarDef         : 'val' patDef
                | 'var' varDef ;

def               : patVarDef
                | 'def' funDef
                | 'type' Nl* typeDef
                | tmplDef ;

patDef            : pattern2 (',' pattern2)* (':' type)* '=' expr ;

varDef            : patDef
                | ids ':' type '=' '_' ;

funDef            : funSig (':' type)? '=' expr
                | funSig Nl? '{' block '}'
                | 'this' paramClause paramClauses
                  ('=' constrExpr | Nl constrBlock) ;

typeDef           :  Id typeParamClause? '=' type ;

tmplDef           : 'case'? 'class' classDef
                | 'case' 'object' objectDef
                | 'trait' traitDef ;

classDef          : Id typeParamClause? constrAnnotation* accessModifier?
                  classParamClauses classTemplateOpt ;

traitDef          : Id typeParamClause? traitTemplateOpt ;

objectDef         : Id classTemplateOpt ;

classTemplateOpt  : 'extends' classTemplate | ('extends'? templateBody)? ;

traitTemplateOpt  : 'extends' traitTemplate | ('extends'? templateBody)? ;

classTemplate     : earlyDefs? classParents templateBody? ;

traitTemplate     : earlyDefs? traitParents templateBody? ;

classParents      : constr ('with' annotType)* ;

traitParents      : annotType ('with' annotType)* ;

constr            : annotType argumentExprs* ;

earlyDefs         : '{' (earlyDef (Semi earlyDef)*)? '}' 'with' ;

earlyDef          : (annotation Nl?)* modifier* patVarDef ;

constrExpr        : selfInvocation
                | constrBlock ;

constrBlock       : '{' selfInvocation (Semi blockStat)* '}' ;
selfInvocation    : 'this' argumentExprs+ ;

topStatSeq        : topStat (Semi topStat)* ;

topStat           : (annotation Nl?)* modifier* tmplDef
                | import_
                | packaging
                | packageObject
                | ;

packaging         : 'package' qualId Nl? '{' topStatSeq '}' ;

packageObject     : 'package' 'object' objectDef ;

compilationUnit   : ('package' qualId Semi)* topStatSeq ;

// Lexer
BooleanLiteral   :  'true' | 'false';
CharacterLiteral :  '\'' (PrintableChar | CharEscapeSeq) '\'';
StringLiteral    :  '"' StringElement* '"'
               |  '"""' MultiLineChars '"""';
SymbolLiteral    :  '\'' Plainid;
IntegerLiteral   :  (DecimalNumeral | HexNumeral) ('L' | 'l');
FloatingPointLiteral
               :  Digit+ '.' Digit+ ExponentPart? FloatType?
               |  '.' Digit+ ExponentPart? FloatType?
               |  Digit ExponentPart FloatType?
               |  Digit+ ExponentPart? FloatType;
Id               :  Plainid
               |  '`' StringLiteral '`';
Varid            :  Lower Idrest;
Nl               :  '\r'? '\n';
Semi             :  ';' |  Nl+;

Paren            :  '(' | ')' | '[' | ']' | '{' | '}';
Delim            :  '`' | '\'' | '"' | '.' | ';' | ',' ;

Comment          :  '/*' .*?  '*/'
               |  '//' .*? Nl;

// fragments
fragment UnicodeEscape    : '\\' 'u' 'u'? HexDigit HexDigit HexDigit HexDigit ;
fragment WhiteSpace       :  '\u0020' | '\u0009' | '\u000D' | '\u000A';
fragment Opchar           : PrintableChar // printableChar not matched by (whiteSpace | upper | lower |
                        // letter | digit | paren | delim | opchar | Unicode_Sm | Unicode_So)
                        ;
fragment Op               :  Opchar+;
fragment Plainid          :  Upper Idrest
                        |  Varid
                        |  Op;
fragment Idrest           :  (Letter | Digit)* ('_' Op)?;

fragment StringElement    :  '\u0020'| '\u0021'|'\u0023' .. '\u007F'  // (PrintableChar  Except '"')
                        |  CharEscapeSeq;
fragment MultiLineChars   :  ('"'? '"'? .*?)* '"'*;

fragment HexDigit         :  '0' .. '9'  |  'A' .. 'Z'  |  'a' .. 'z' ;
fragment FloatType        :  'F' | 'f' | 'D' | 'd';
fragment Upper            :  'A'  ..  'Z' | '$' | '_';  // and Unicode category Lu
fragment Lower            :  'a' .. 'z'; // and Unicode category Ll
fragment Letter           :  Upper | Lower; // and Unicode categories Lo, Lt, Nl
fragment ExponentPart     :  ('E' | 'e') ('+' | '-')? Digit+;
fragment PrintableChar    : '\u0020' .. '\u007F' ;
fragment CharEscapeSeq    : '\\' ('b' | 't' | 'n' | 'f' | 'r' | '"' | '\'' | '\\');
fragment DecimalNumeral   :  '0' | NonZeroDigit Digit*;
fragment HexNumeral       :  '0' 'x' HexDigit HexDigit+;
fragment Digit            :  '0' | NonZeroDigit;
fragment NonZeroDigit     :  '1' .. '9';

The above Scala grammar is same as what I got from Scala official website:

http://www.scala-lang.org/files/archive/spec/2.11/13-syntax-summary.html

Now I am trying to generate tokens for a scala file named scala.scala. Code for that file is below :

object HelloWorld {
  def main(args: Array[String]) {
    println("Hello, world!")
  }
}

I am running the following command to get the tokens :

grun Scala compilationUnit -tokens scala.scala

or

grun Scala expr -tokens scala.scala

or

grun Scala literal -tokens scala.scala

The output I got is:

[@0,0:18='object HelloWorld {',<68>,1:0]
[@1,19:19='\n',<70>,1:19]
[@2,20:52='  def main(args: Array[String]) {',<68>,2:0]
[@3,53:53='\n',<70>,2:33]
[@4,54:81='    println("Hello, world!")',<68>,3:0]
[@5,82:82='\n',<70>,3:28]
[@6,83:85='  }',<68>,4:0]
[@7,86:86='\n',<70>,4:3]
[@8,87:87='}',<14>,5:0]
[@9,88:88='\n',<70>,5:1]
[@10,89:88='<EOF>',<-1>,6:0]
line 1:19 no viable alternative at input 'object HelloWorld {\n'

Output in the tree form is like this :

(expr object HelloWorld { \n   def main(args: Array[String]) { \n     println("Hello, world!") \n   } \n } \n)

and output in the gui is like this :

That is completely stupid. In place of tokens it's giving me simply LOC . I tested it for the other languages Java and C and it works perfect. It gives me correct output/correct tokens which are expected for the following grammar links:

https://github.com/antlr/grammars-v4

Please correct me If I am doing something wrong because I am new to Antlr and Scala.

What I meant from token is all keywords,operands and all operators are there. According to me it's never meant to be simply Lines of Code.

Below is the Scala.tokens file which I got using Scala.g4(Scala Grammar with ANTLR).



T__0=1
T__1=2
T__2=3
T__3=4
T__4=5
T__5=6
T__6=7
T__7=8
T__8=9
T__9=10
T__10=11
T__11=12
T__12=13
T__13=14
T__14=15
T__15=16
T__16=17
T__17=18
T__18=19
T__19=20
T__20=21
T__21=22
T__22=23
T__23=24
T__24=25
T__25=26
T__26=27
T__27=28
T__28=29
T__29=30
T__30=31
T__31=32
T__32=33
T__33=34
T__34=35
T__35=36
T__36=37
T__37=38
T__38=39
T__39=40
T__40=41
T__41=42
T__42=43
T__43=44
T__44=45
T__45=46
T__46=47
T__47=48
T__48=49
T__49=50
T__50=51
T__51=52
T__52=53
T__53=54
T__54=55
T__55=56
T__56=57
T__57=58
T__58=59
T__59=60
T__60=61
BooleanLiteral=62
CharacterLiteral=63
StringLiteral=64
SymbolLiteral=65
IntegerLiteral=66
FloatingPointLiteral=67
Id=68
Varid=69
Nl=70
Semi=71
Paren=72
Delim=73
Comment=74
'-'=1
'null'=2
'.'=3
','=4
'this'=5
'super'=6
'['=7
']'=8
'=>'=9
'('=10
')'=11
'forSome'=12
'{'=13
'}'=14
'type'=15
'val'=16
'with'=17
'#'=18
':'=19
'_'=20
'*'=21
'implicit'=22
'if'=23
'else'=24
'while'=25
'try'=26
'catch'=27
'finally'=28
'do'=29
'for'=30
'yield'=31
'throw'=32
'return'=33
'new'=34
'='=35
'match'=36
'+'=37
'~'=38
'!'=39
'lazy'=40
'<-'=41
'case'=42
'|'=43
'@'=44
'>:'=45
'<:'=46
'<%'=47
'var'=48
'override'=49
'abstract'=50
'final'=51
'sealed'=52
'private'=53
'protected'=54
'import'=55
'def'=56
'class'=57
'object'=58
'trait'=59
'extends'=60
'package'=61

I am sure that these tokens are not correct. Can anyone make sure is this problem with the Scala Gramma or with the ANTLR?