I am following this tutorial and successfully replicated its behavior except that I am using Antlr 4.7 instead of the 4.5 that the tutorial was using.
I am trying to build a DSL for expense tracker.
Was wondering if each element can have attributes?
E.g. this is what it looks like now
This is the code for the todo.g4 as seen in https://github.com/simkimsia/learn-antlr-web-js/blob/master/todo.g4
grammar todo;
elements
: (element|emptyLine)* EOF
;
element
: '*' ( ' ' | '\t' )* CONTENT NL+
;
emptyLine
: NL
;
NL
: '\r' | '\n'
;
CONTENT
: [a-zA-Z0-9_][a-zA-Z0-9_ \t]*
;
Meaning to say the element will also have 2 attributes such as amount and payee. To keep it simple, I will have the same sentence structure so to allow parsing to be done more easily.
the format will be pay [payee] [amount]
the example is pay Acme Corp 123,789.45
so the payee is Acme Corp and the amount is 12378945 as expressed in integers to denote the amount in denominations of cents
another example is pay Banana Inc 700
so the payee is Banana Inc and the amount is 70000 as expressed in integers to denote the amount in denominations of cents
I am guessing I need to change the todo.g4 and then re generate the parser.
Can an element have other attributes? If so, how do I get started?
UPDATE
This is my latest attempts ranked with latest updates on top:
I just figured out how to use grun and testRig. Thanks @Raven for that tip.
latest attempt: My latest expense.g4 (only difference from earlier attempt is the regex for payment)
grammar expense;
payments: (payment NL)* ;
payment: PAY receiver amount=NUMBER ;
receiver: surname=ID (lastname=ID)? ;
PAY: 'pay' ;
NUMBER: ([0-9]+(','[0-9]+)*)('.'[0-9]*)?;
ID: [a-zA-Z0-9_]+ ;
NL: '\n' | '\r\n' ;
WS: [\t ]+ -> skip ;
Earlier attempt: This is my expense.g4
grammar expense;
payments: (payment NL)* ;
payment: PAY receiver amount=NUMBER ;
receiver: surname=ID (lastname=ID)? ;
PAY: 'pay' ;
NUMBER: [0-9]+ (',' [0-9]+)+ ('.' [0-9]+)? ;
ID: [a-zA-Z0-9_]+ ;
NL: '\n' | '\r\n' ;
WS: [\t ]+ -> skip ;
Earlier attempt: https://github.com/simkimsia/learn-antlr-web-js/commit/728813ac275a3f2ad16d7f51ce15fcc27d40045b#commitcomment-25127606
Earlier attempt: https://github.com/simkimsia/learn-antlr-web-js/commit/0c32aec6ffb4b4275db86d54e9788058a2ce8759#commitcomment-25125695
I'm not entirely sure what exactly you want but for the provided examples this grammar should do the job:
If this is what you were asking for I will add some more explanation if needed...
Situation on October 24. 2017 at 19:00 UTC+1.
Your grammar works perfectly. I made a full test in Java.
File
Expense.g4
:File
ExpenseMyListener.java
:File
test_expense.java
:Input file
top.text
:Execution :
Of course regenerate after each change. For me it's :
where
The more you describe specific contents, the more you may face ambiguity problems. For example, you have two rules :
Consider the following content :
* pay Federico Tomassetti 10
is ambiguous and can be matched by the two rules, but it will finally be parsed as free text, because of€ for the tutorial
which doesn't satisfypayment
.If later you change the
payment
rule to accept more info after the amount :the above content will be matched by
payment
(in case of ambiguity ANTLR chooses the first rule). The good news is that ANTLR 4 is very strong to disambiguate, it reads the whole file if necessary.For ambiguous tokens and precedence rules, read the posts of these last three weeks, a lot have been said.
Mixing Raven's grammar with yours, this is one possible solution :
File
Question.g4
File
t.text
Execution :
As you can see, the € symbol is not recognized. You may need a
CONTENT
rule similar toFIELDTEXT
here, and then you get into trouble ...Federico's Mega tutorial is a good start. For nitty-gritty details, see The Definitive ANTLR 4 Reference or the online doc from www.antlr.org.