I am new to Scala and I am trying to parse Scala files with the use of Scala Grammar and ANTLR. Below is the code for Scala Grammar which I got from the git hub link:
https://github.com/antlr/grammars-v4/tree/master/scala
There are chances of repo to be moved so I am pasting the Scala grammar code here:
grammar Scala;
literal : '-'? IntegerLiteral
| '-'? FloatingPointLiteral
| BooleanLiteral
| CharacterLiteral
| StringLiteral
| SymbolLiteral
| 'null' ;
qualId : Id ('.' Id)* ;
ids : Id (',' Id)* ;
stableId : (Id | (Id '.')? 'this') '.' Id
| (Id '.')? 'super' classQualifier? '.' Id ;
classQualifier : '[' Id ']' ;
type : functionArgTypes '=>' type
| infixType existentialClause? ;
functionArgTypes : infixType
| '(' ( paramType (',' paramType )* )? ')' ;
existentialClause : 'forSome' '{' existentialDcl (Semi existentialDcl)* '}';
existentialDcl : 'type' typeDcl
| 'val' valDcl;
infixType : compoundType (Id Nl? compoundType)*;
compoundType : annotType ('with' annotType)* refinement?
| refinement;
annotType : simpleType annotation*;
simpleType : simpleType typeArgs
| simpleType '#' Id
| stableId
| (stableId | (Id '.')? 'this') '.' 'type'
| '(' types ')';
typeArgs : '[' types ']';
types : type (',' type)*;
refinement : Nl? '{' refineStat (Semi refineStat)* '}';
refineStat : dcl
| 'type' typeDef
| ;
typePat : type;
ascription : ':' infixType
| ':' annotation+
| ':' '_' '*';
expr : (bindings | 'implicit'? Id | '_') '=>' expr
| expr1 ;
expr1 : 'if' '(' expr ')' Nl* expr (Semi? 'else' expr)?
| 'while' '(' expr ')' Nl* expr
| 'try' ('{' block '}' | expr) ('catch' '{' caseClauses '}')? ('finally' expr)?
| 'do' expr Semi? 'while' '(' expr ')'
| 'for' ('(' enumerators ')' | '{' enumerators '}') Nl* 'yield'? expr
| 'throw' expr
| 'return' expr?
| (('new' (classTemplate | templateBody)| blockExpr | simpleExpr1 '_'?) '.') Id '=' expr
| simpleExpr1 argumentExprs '=' expr
| postfixExpr
| postfixExpr ascription
| postfixExpr 'match' '{' caseClauses '}' ;
postfixExpr : infixExpr (Id Nl?)? ;
infixExpr : prefixExpr
| infixExpr Id Nl? infixExpr ;
prefixExpr : ('-' | '+' | '~' | '!')?
('new' (classTemplate | templateBody)| blockExpr | simpleExpr1 '_'?) ;
simpleExpr1 : literal
| stableId
| (Id '.')? 'this'
| '_'
| '(' exprs? ')'
| ('new' (classTemplate | templateBody) | blockExpr ) '.' Id
| ('new' (classTemplate | templateBody) | blockExpr ) typeArgs
| simpleExpr1 argumentExprs
;
exprs : expr (',' expr)* ;
argumentExprs : '(' exprs? ')'
| '(' (exprs ',')? postfixExpr ':' '_' '*' ')'
| Nl? blockExpr ;
blockExpr : '{' caseClauses '}'
| '{' block '}' ;
block : blockStat (Semi blockStat)* resultExpr? ;
blockStat : import_
| annotation* ('implicit' | 'lazy')? def
| annotation* localModifier* tmplDef
| expr1
| ;
resultExpr : expr1
| (bindings | ('implicit'? Id | '_') ':' compoundType) '=>' block ;
enumerators : generator (Semi generator)* ;
generator : pattern1 '<-' expr (Semi? guard | Semi pattern1 '=' expr)* ;
caseClauses : caseClause+ ;
caseClause : 'case' pattern guard? '=>' block ;
guard : 'if' postfixExpr ;
pattern : pattern1 ('|' pattern1 )* ;
pattern1 : Varid ':' typePat
| '_' ':' typePat
| pattern2 ;
pattern2 : Varid ('@' pattern3)?
| pattern3 ;
pattern3 : simplePattern
| simplePattern (Id Nl? simplePattern)* ;
simplePattern : '_'
| Varid
| literal
| stableId ('(' patterns ')')?
| stableId '(' (patterns ',')? (Varid '@')? '_' '*' ')'
| '(' patterns? ')' ;
patterns : pattern (',' patterns)*
| '_' * ;
typeParamClause : '[' variantTypeParam (',' variantTypeParam)* ']' ;
funTypeParamClause: '[' typeParam (',' typeParam)* ']' ;
variantTypeParam : annotation? ('+' | '-')? typeParam ;
typeParam : (Id | '_') typeParamClause? ('>:' type)? ('<:' type)?
('<%' type)* (':' type)* ;
paramClauses : paramClause* (Nl? '(' 'implicit' params ')')? ;
paramClause : Nl? '(' params? ')' ;
params : param (',' param)* ;
param : annotation* Id (':' paramType)? ('=' expr)? ;
paramType : type
| '=>' type
| type '*';
classParamClauses : classParamClause*
(Nl? '(' 'implicit' classParams ')')? ;
classParamClause : Nl? '(' classParams? ')' ;
classParams : classParam (',' classParam)* ;
classParam : annotation* modifier* ('val' | 'var')?
Id ':' paramType ('=' expr)? ;
bindings : '(' binding (',' binding )* ')' ;
binding : (Id | '_') (':' type)? ;
modifier : localModifier
| accessModifier
| 'override' ;
localModifier : 'abstract'
| 'final'
| 'sealed'
| 'implicit'
| 'lazy' ;
accessModifier : ('private' | 'protected') accessQualifier? ;
accessQualifier : '[' (Id | 'this') ']' ;
annotation : '@' simpleType argumentExprs* ;
constrAnnotation : '@' simpleType argumentExprs ;
templateBody : Nl? '{' selfType? templateStat (Semi templateStat)* '}' ;
templateStat : import_
| (annotation Nl?)* modifier* def
| (annotation Nl?)* modifier* dcl
| expr
| ;
selfType : Id (':' type)? '=>'
| 'this' ':' type '=>' ;
import_ : 'import' importExpr (',' importExpr)* ;
importExpr : stableId '.' (Id | '_' | importSelectors) ;
importSelectors : '{' (importSelector ',')* (importSelector | '_') '}' ;
importSelector : Id ('=>' Id | '=>' '_') ;
dcl : 'val' valDcl
| 'var' varDcl
| 'def' funDcl
| 'type' Nl* typeDcl ;
valDcl : ids ':' type ;
varDcl : ids ':' type ;
funDcl : funSig (':' type)? ;
funSig : Id funTypeParamClause? paramClauses ;
typeDcl : Id typeParamClause? ('>:' type)? ('<:' type)? ;
patVarDef : 'val' patDef
| 'var' varDef ;
def : patVarDef
| 'def' funDef
| 'type' Nl* typeDef
| tmplDef ;
patDef : pattern2 (',' pattern2)* (':' type)* '=' expr ;
varDef : patDef
| ids ':' type '=' '_' ;
funDef : funSig (':' type)? '=' expr
| funSig Nl? '{' block '}'
| 'this' paramClause paramClauses
('=' constrExpr | Nl constrBlock) ;
typeDef : Id typeParamClause? '=' type ;
tmplDef : 'case'? 'class' classDef
| 'case' 'object' objectDef
| 'trait' traitDef ;
classDef : Id typeParamClause? constrAnnotation* accessModifier?
classParamClauses classTemplateOpt ;
traitDef : Id typeParamClause? traitTemplateOpt ;
objectDef : Id classTemplateOpt ;
classTemplateOpt : 'extends' classTemplate | ('extends'? templateBody)? ;
traitTemplateOpt : 'extends' traitTemplate | ('extends'? templateBody)? ;
classTemplate : earlyDefs? classParents templateBody? ;
traitTemplate : earlyDefs? traitParents templateBody? ;
classParents : constr ('with' annotType)* ;
traitParents : annotType ('with' annotType)* ;
constr : annotType argumentExprs* ;
earlyDefs : '{' (earlyDef (Semi earlyDef)*)? '}' 'with' ;
earlyDef : (annotation Nl?)* modifier* patVarDef ;
constrExpr : selfInvocation
| constrBlock ;
constrBlock : '{' selfInvocation (Semi blockStat)* '}' ;
selfInvocation : 'this' argumentExprs+ ;
topStatSeq : topStat (Semi topStat)* ;
topStat : (annotation Nl?)* modifier* tmplDef
| import_
| packaging
| packageObject
| ;
packaging : 'package' qualId Nl? '{' topStatSeq '}' ;
packageObject : 'package' 'object' objectDef ;
compilationUnit : ('package' qualId Semi)* topStatSeq ;
// Lexer
BooleanLiteral : 'true' | 'false';
CharacterLiteral : '\'' (PrintableChar | CharEscapeSeq) '\'';
StringLiteral : '"' StringElement* '"'
| '"""' MultiLineChars '"""';
SymbolLiteral : '\'' Plainid;
IntegerLiteral : (DecimalNumeral | HexNumeral) ('L' | 'l');
FloatingPointLiteral
: Digit+ '.' Digit+ ExponentPart? FloatType?
| '.' Digit+ ExponentPart? FloatType?
| Digit ExponentPart FloatType?
| Digit+ ExponentPart? FloatType;
Id : Plainid
| '`' StringLiteral '`';
Varid : Lower Idrest;
Nl : '\r'? '\n';
Semi : ';' | Nl+;
Paren : '(' | ')' | '[' | ']' | '{' | '}';
Delim : '`' | '\'' | '"' | '.' | ';' | ',' ;
Comment : '/*' .*? '*/'
| '//' .*? Nl;
// fragments
fragment UnicodeEscape : '\\' 'u' 'u'? HexDigit HexDigit HexDigit HexDigit ;
fragment WhiteSpace : '\u0020' | '\u0009' | '\u000D' | '\u000A';
fragment Opchar : PrintableChar // printableChar not matched by (whiteSpace | upper | lower |
// letter | digit | paren | delim | opchar | Unicode_Sm | Unicode_So)
;
fragment Op : Opchar+;
fragment Plainid : Upper Idrest
| Varid
| Op;
fragment Idrest : (Letter | Digit)* ('_' Op)?;
fragment StringElement : '\u0020'| '\u0021'|'\u0023' .. '\u007F' // (PrintableChar Except '"')
| CharEscapeSeq;
fragment MultiLineChars : ('"'? '"'? .*?)* '"'*;
fragment HexDigit : '0' .. '9' | 'A' .. 'Z' | 'a' .. 'z' ;
fragment FloatType : 'F' | 'f' | 'D' | 'd';
fragment Upper : 'A' .. 'Z' | '$' | '_'; // and Unicode category Lu
fragment Lower : 'a' .. 'z'; // and Unicode category Ll
fragment Letter : Upper | Lower; // and Unicode categories Lo, Lt, Nl
fragment ExponentPart : ('E' | 'e') ('+' | '-')? Digit+;
fragment PrintableChar : '\u0020' .. '\u007F' ;
fragment CharEscapeSeq : '\\' ('b' | 't' | 'n' | 'f' | 'r' | '"' | '\'' | '\\');
fragment DecimalNumeral : '0' | NonZeroDigit Digit*;
fragment HexNumeral : '0' 'x' HexDigit HexDigit+;
fragment Digit : '0' | NonZeroDigit;
fragment NonZeroDigit : '1' .. '9';
The above Scala grammar is same as what I got from Scala official website:
http://www.scala-lang.org/files/archive/spec/2.11/13-syntax-summary.html
Now I am trying to generate tokens for a scala file named scala.scala. Code for that file is below :
object HelloWorld {
def main(args: Array[String]) {
println("Hello, world!")
}
}
I am running the following command to get the tokens :
grun Scala compilationUnit -tokens scala.scala
or
grun Scala expr -tokens scala.scala
or
grun Scala literal -tokens scala.scala
The output I got is:
[@0,0:18='object HelloWorld {',<68>,1:0]
[@1,19:19='\n',<70>,1:19]
[@2,20:52=' def main(args: Array[String]) {',<68>,2:0]
[@3,53:53='\n',<70>,2:33]
[@4,54:81=' println("Hello, world!")',<68>,3:0]
[@5,82:82='\n',<70>,3:28]
[@6,83:85=' }',<68>,4:0]
[@7,86:86='\n',<70>,4:3]
[@8,87:87='}',<14>,5:0]
[@9,88:88='\n',<70>,5:1]
[@10,89:88='<EOF>',<-1>,6:0]
line 1:19 no viable alternative at input 'object HelloWorld {\n'
Output in the tree form is like this :
(expr object HelloWorld { \n def main(args: Array[String]) { \n println("Hello, world!") \n } \n } \n)
and output in the gui is like this :
That is completely stupid. In place of tokens it's giving me simply LOC . I tested it for the other languages Java and C and it works perfect. It gives me correct output/correct tokens which are expected for the following grammar links:
https://github.com/antlr/grammars-v4
Please correct me If I am doing something wrong because I am new to Antlr and Scala.
What I meant from token is all keywords,operands and all operators are there. According to me it's never meant to be simply Lines of Code.
Below is the Scala.tokens file which I got using Scala.g4(Scala Grammar with ANTLR).
T__0=1
T__1=2
T__2=3
T__3=4
T__4=5
T__5=6
T__6=7
T__7=8
T__8=9
T__9=10
T__10=11
T__11=12
T__12=13
T__13=14
T__14=15
T__15=16
T__16=17
T__17=18
T__18=19
T__19=20
T__20=21
T__21=22
T__22=23
T__23=24
T__24=25
T__25=26
T__26=27
T__27=28
T__28=29
T__29=30
T__30=31
T__31=32
T__32=33
T__33=34
T__34=35
T__35=36
T__36=37
T__37=38
T__38=39
T__39=40
T__40=41
T__41=42
T__42=43
T__43=44
T__44=45
T__45=46
T__46=47
T__47=48
T__48=49
T__49=50
T__50=51
T__51=52
T__52=53
T__53=54
T__54=55
T__55=56
T__56=57
T__57=58
T__58=59
T__59=60
T__60=61
BooleanLiteral=62
CharacterLiteral=63
StringLiteral=64
SymbolLiteral=65
IntegerLiteral=66
FloatingPointLiteral=67
Id=68
Varid=69
Nl=70
Semi=71
Paren=72
Delim=73
Comment=74
'-'=1
'null'=2
'.'=3
','=4
'this'=5
'super'=6
'['=7
']'=8
'=>'=9
'('=10
')'=11
'forSome'=12
'{'=13
'}'=14
'type'=15
'val'=16
'with'=17
'#'=18
':'=19
'_'=20
'*'=21
'implicit'=22
'if'=23
'else'=24
'while'=25
'try'=26
'catch'=27
'finally'=28
'do'=29
'for'=30
'yield'=31
'throw'=32
'return'=33
'new'=34
'='=35
'match'=36
'+'=37
'~'=38
'!'=39
'lazy'=40
'<-'=41
'case'=42
'|'=43
'@'=44
'>:'=45
'<:'=46
'<%'=47
'var'=48
'override'=49
'abstract'=50
'final'=51
'sealed'=52
'private'=53
'protected'=54
'import'=55
'def'=56
'class'=57
'object'=58
'trait'=59
'extends'=60
'package'=61
I am sure that these tokens are not correct. Can anyone make sure is this problem with the Scala Gramma or with the ANTLR?