How can I parse code to build a compiler in Java?

2019-03-20 08:02发布

I need to write a compiler. It's homework at the univ. The teacher told us that we can use any API we want to do the parsing of the code, as long as it is a good one. That way we can focus more on the JVM we will generate.

So yes, I'll write a compiler in Java to generate Java.

Do you know any good API for this? Should I use regex? I normally write my own parsers by hand, though it is not advisable in this scenario.

Any help would be appreciated.

12条回答
欢心
2楼-- · 2019-03-20 08:16

I would recommend ANTLR, primarily because of its output generation capabilities via StringTemplate.

What is better is that Terence Parr's book on the same is by far one of the better books oriented towards writing compilers with a parser generator.

Then you have ANTLRWorks which enables you to study and debug your grammar on the fly.

To top it all, the ANTLR wiki + documentation, (although not comprehensive enough to my liking), is a good place to start off for any beginner. It helped me refresh knowledge on compiler writing in a week.

查看更多
【Aperson】
3楼-- · 2019-03-20 08:16

If you're going to go hardcore, throw in a bit of http://llvm.org in the mix :)

查看更多
孤傲高冷的网名
4楼-- · 2019-03-20 08:17

I've used SableCC in my compiler course, though not by choice.

I remember finding it very bulky and heavyweight, with more emphasis on cleanliness than convenience (no operator precedence or anything; you have to state that in the grammar).

I'd probably want to use something else if I had the choice. My experiences with yacc (for C) and happy (for Haskell) have both been pleasant.

查看更多
欢心
5楼-- · 2019-03-20 08:20

Use a parser combinator, like JParsec. There's a good video tutorial on how to use it.

查看更多
祖国的老花朵
6楼-- · 2019-03-20 08:21

Regex is good to use in a compiler, but only for recognizing tokens (i.e. no recursive structures).

The classic way of writing a compiler is having a lexical analyzer for recognizing tokens, a syntax analyzer for recognizing structure, a semantic analyzer for recognizing meaning, an intermediate code generator, an optimizer, and last a target code generator. Any of those steps can be merged, or skipped entirely, if makes the compiler easier to write.

There have been many tools developed to help with this process. For Java, you can look at

查看更多
孤傲高冷的网名
7楼-- · 2019-03-20 08:23

Go classic - Lex + Yacc. In Java it spells JAX and javacc. Javacc even has some Java grammars ready for inspection.

查看更多
登录 后发表回答