I want to define a HTML/Markdown like grammar for an document that gets transformed to an AST. I'm aware, that ANTLR4 is not the best tool for doing Markdown things but I'm way closer to the HTML direction. At least I think I am. :)
Here's my lexer definition:
lexer grammar dnpMDLexer;
NL
: [\r\n]
;
HEAD_TAG
: '#'
;
HEADING_TEXT
: ('\\#'|~[#`\r\n])+
;
ITALIC_TAG
: '*'
;
ITALIC_TEXT
: ('\\*'|~[#`*\r\n]).+?
;
LISTING_TAG
: '`'
;
RUNNING_TEXT
: ('\\#'|'\\`'|'\\*'|~[#*`])+
;
And here's my parser definition:
parser grammar dnpMDParser;
options { tokenVocab=dnpMDLexer; }
dnpMD
: subheadline headline lead body
;
subheadline
: HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG NL
;
headline
: HEAD_TAG HEADING_TEXT HEAD_TAG NL
;
lead
: HEAD_TAG HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG HEAD_TAG
;
subheading
: HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG
;
listing
: LISTING_TAG LISTING_TAG LISTING_TAG LISTING_TAG .+? LISTING_TAG LISTING_TAG LISTING_TAG LISTING_TAG
;
italic
: ITALIC_TAG ITALIC_TEXT ITALIC_TAG
;
body
: RUNNING_TEXT body
| subheading body
| listing body
| italic body
| EOF
;
I tried this stuff in ANTLRworks2 and IntelliJ with the ANTLR4 plugin.
I've heavy problems with the listing and the italic rule. Matching way to much in some cases and nothing in other. In the above version the italic style do not work.
Am I heading in the right direction? I tried to use the HTML grammar as a template. Not quite sure if the ANTLR4 modes could help me to distinguish between outer text and inner text of tags?
Maybe someone has some useful hints. I'm thankful for every hint I can get because I'm not 100% sure that the way I'm working on this problem will lead me towards the right direction.
Here's an image of the TestRig within ANTLRworks2. The second italic rule is matching way to much.
Thanks, Fabian
The current solution looks like this lexer and grammar rules:
This language is working fine and maybe someone can benefit from it.
Thanks to all who helped out! Fabian