How do I make my own parser for java/jsf code?

2019-06-27 23:54发布

Hi I'd like to make my own 'parser', e.g: computing (4+(3-4^2))*2 or parsing java,jsf,html code.

In fact I did something like this but I feel it's not good.

Is there anything good for me? I've tried to read more, but I'm bit confused, LL, LR, AST,BNF,javacc yacc etc :). I'm not sure which way to go, when I would like to compute 4+...

or if I'd like to parse java,jsf code and produce something from this(another java code)

Is there anything generaly good enough like ast? or something which I can use for both?

thank you for help.

标签: java parsing
8条回答
不美不萌又怎样
2楼-- · 2019-06-28 00:18

Parsers can be pretty intense to write. The standard tools are bison or yacc for the grammar, and flex for the syntax. These all output code in C or C++.

查看更多
叛逆
3楼-- · 2019-06-28 00:19

You might want to check out Building Parsers With Java by Steven John Metsker. The book seems to cover exactly what you are looking to do.

查看更多
\"骚年 ilove
4楼-- · 2019-06-28 00:20

You might want to check out http://antlr.org/. It will output java code. If I recall, one of their samples is pretty much what you want.

查看更多
Lonely孤独者°
5楼-- · 2019-06-28 00:26

If it is a learning exercise, try starting with a top-down parser -- they are simple to write and don't require including/learning any other tools. Best place to research the basics is probably wikipedia or code-project.

查看更多
Deceive 欺骗
6楼-- · 2019-06-28 00:27

Before anything else, you have to understand that everything about parsing is based on grammars.

Grammars describe the language you want to implement in terms of how to decompose the text in basic units and how to stack those units in some meaning ful way. You may also want to look for the token, non-terminal, terminal concepts.

Differences between LL and LR can be of two kinds: implementation differences, and grammar writing differences. If you use a standard tool you only need to understand the second part.

I usually use LL (top-down) grammars. They are simpler to write and to implement even using custom code. LR grammars theoretically cover more kinds of languages but in a normal situation they are just a hindrance when you need some correct error detection.

Some random pointers:

  • javacc (java, LL),
  • antlr (java, LL),
  • yepp (smarteiffel, LL),
  • bison (C, LR, GNU version of the venerable yacc)
查看更多
▲ chillily
7楼-- · 2019-06-28 00:34

ANTLR, but make sure you read The Definitive ANTLR Reference, which will walk you through the creation of parsers. ANTLR does top-down, LL parsers, so the book doesn't address LALR and other types.

JavaCC, Yacc, are SableCC are more traditional lexer/parser generators, and you'll find that they're a little more primitive and have steeper learning curves. ANTLR is equally powerful, but you don't have to learn it all at once. Wikipedia offers a comprehensive comparison of parser generators.

BNF is a syntax for specifying the grammar; ANTLR uses its own, which I find more aesthetic but which others often don't.

查看更多
登录 后发表回答