HTML/Markdown style grammar for ANTLR4

2019-03-04 20:03发布

I want to define a HTML/Markdown like grammar for an document that gets transformed to an AST. I'm aware, that ANTLR4 is not the best tool for doing Markdown things but I'm way closer to the HTML direction. At least I think I am. :)

Here's my lexer definition:

lexer grammar dnpMDLexer;

NL
    : [\r\n]
    ;

HEAD_TAG
    : '#'
    ;

HEADING_TEXT
    : ('\\#'|~[#`\r\n])+
    ;

ITALIC_TAG
    : '*'
    ;

ITALIC_TEXT
    : ('\\*'|~[#`*\r\n]).+?
    ;

LISTING_TAG
    : '`'
    ;

RUNNING_TEXT
    : ('\\#'|'\\`'|'\\*'|~[#*`])+
    ;

And here's my parser definition:

parser grammar dnpMDParser;

options { tokenVocab=dnpMDLexer; }

dnpMD
    : subheadline headline lead body
    ;

subheadline
    : HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG NL
    ;

headline
    : HEAD_TAG HEADING_TEXT HEAD_TAG NL
    ;

lead
    : HEAD_TAG HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG HEAD_TAG
    ;

subheading
    : HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG HEADING_TEXT HEAD_TAG HEAD_TAG HEAD_TAG HEAD_TAG
    ;

listing
     : LISTING_TAG LISTING_TAG LISTING_TAG LISTING_TAG .+? LISTING_TAG LISTING_TAG LISTING_TAG LISTING_TAG
     ;

italic
    : ITALIC_TAG ITALIC_TEXT ITALIC_TAG
    ;

body
    : RUNNING_TEXT body
    | subheading body
    | listing body
    | italic body
    | EOF
    ;

I tried this stuff in ANTLRworks2 and IntelliJ with the ANTLR4 plugin.

I've heavy problems with the listing and the italic rule. Matching way to much in some cases and nothing in other. In the above version the italic style do not work.

Am I heading in the right direction? I tried to use the HTML grammar as a template. Not quite sure if the ANTLR4 modes could help me to distinguish between outer text and inner text of tags?

Maybe someone has some useful hints. I'm thankful for every hint I can get because I'm not 100% sure that the way I'm working on this problem will lead me towards the right direction.

Here's an image of the TestRig within ANTLRworks2. The second italic rule is matching way to much.

enter image description here

Thanks, Fabian

1条回答
Juvenile、少年°
2楼-- · 2019-03-04 20:48

The current solution looks like this lexer and grammar rules:

lexer grammar dnpMDAuslagernLexer;

/*@members {
    public static final int COMMENTS = 1;
}*/

NL
    : [\r\n]
    ;

SUBHEADLINE
    : '##' (~[\r\n])+? '##'
    ;

HEADLINE
    : '#' ('\\#'|~[\r\n])+? '#'
    ;

LEAD
    : '###' (~[\r\n])+? '###'
    ;

SUBHEADING
    : '####' (~[\r\n])+? '####'
    ;

CAPTION
    : '#####' (~[\r\n])+? '#####'
    ;

LISTING
    : '~~~~~' .+? '~~~~~'
    ;

ELEMENTPATH
    : '[[[[[' (~[\r\n])+? ']]]]]'
    ;

LABELREF
    : '{##' (~[\r\n])+? '##}'
    ;

LABEL
    : '{#' (~[\r\n])+? '#}'
    ;

ITALIC
    : '*' (~[\r\n])+? '*'
    ;

SINGLE_COMMENT
    : '//' (~[\r\n])+ -> channel(1)
    ;

MULTI_COMMENT
    : '/*' .*? '*/' -> channel(1)
    ;

STAR
    : '*'
    ;

BRACE_OPEN
    : '{'
    ;

TEXT
    : (~[\r\n*{])+
    ;

parser grammar dnpMDAuslagernParser;

options { tokenVocab=dnpMDAuslagernLexer; }

dnpMD
    : head body
    ;

head
    : subheadline headline lead
    ;

subheadline
    : SUBHEADLINE NL+
    ;

headline
    : HEADLINE NL+
    ;

lead
    : LEAD
    ;

subheading
    : SUBHEADING
    ;

caption
    : CAPTION
    ;

listing
    : LISTING (NL listingPath)? (NL label)? NL caption
    ;

image
    : caption (NL label)? (NL imagePath)?
    ;

listingPath
    : ELEMENTPATH
    ;

imagePath
    : ELEMENTPATH
    ;

labelRef
    : LABELREF
    ;

label
    : LABEL
    ;

italic
    : ITALIC
    ;

singleComment
    : SINGLE_COMMENT
    ;

multiComment
    : MULTI_COMMENT
    ;

paragraph
    : TEXT? italic TEXT?
    | TEXT? STAR TEXT?
    | TEXT? labelRef TEXT?
    | TEXT? BRACE_OPEN TEXT?
    | TEXT? LABEL TEXT?
    | ELEMENTPATH
    | TEXT
    ;

newlines
    : NL+
    ;

body
    : bodyElements+
    ;

bodyElements
    : singleComment
    | multiComment
    | paragraph
    | subheading
    | listing
    | image
    | newlines
    ;

This language is working fine and maybe someone can benefit from it.

Thanks to all who helped out! Fabian

查看更多
登录 后发表回答