How can I parse code to build a compiler in Java?

2019-03-20 08:29发布

问题:

I need to write a compiler. It's homework at the univ. The teacher told us that we can use any API we want to do the parsing of the code, as long as it is a good one. That way we can focus more on the JVM we will generate.

So yes, I'll write a compiler in Java to generate Java.

Do you know any good API for this? Should I use regex? I normally write my own parsers by hand, though it is not advisable in this scenario.

Any help would be appreciated.

回答1:

Regex is good to use in a compiler, but only for recognizing tokens (i.e. no recursive structures).

The classic way of writing a compiler is having a lexical analyzer for recognizing tokens, a syntax analyzer for recognizing structure, a semantic analyzer for recognizing meaning, an intermediate code generator, an optimizer, and last a target code generator. Any of those steps can be merged, or skipped entirely, if makes the compiler easier to write.

There have been many tools developed to help with this process. For Java, you can look at

  • ANTLR - http://www.antlr.org/
  • Coco/R - http://ssw.jku.at/Coco/
  • JavaCC - https://javacc.dev.java.net/
  • SableCC - http://sablecc.org/


回答2:

I would recommend ANTLR, primarily because of its output generation capabilities via StringTemplate.

What is better is that Terence Parr's book on the same is by far one of the better books oriented towards writing compilers with a parser generator.

Then you have ANTLRWorks which enables you to study and debug your grammar on the fly.

To top it all, the ANTLR wiki + documentation, (although not comprehensive enough to my liking), is a good place to start off for any beginner. It helped me refresh knowledge on compiler writing in a week.



回答3:

Have a look at JavaCC, a language parser for Java. It's very easy to use and get the hang of



回答4:

Go classic - Lex + Yacc. In Java it spells JAX and javacc. Javacc even has some Java grammars ready for inspection.



回答5:

I'd recommend using either a metacompiler like ANTLR, or a simple parser combinator library. Functional Java has a parser combinator API. There's also JParsec. Both of these are based on the Parsec library for Haskell.



回答6:

JFlex is a scanner generator which, according to the manual, is designed to work with the parser generator CUP.

One of the main design goals of JFlex was to make interfacing with the free Java parser generator CUP as easy as possibly [sic].

It also has support for BYACC/J, which, as its name suggests, is a port of Berkeley YACC to generate Java code.

I have used JFlex itself and liked it. Howeveer, the project I was doing was simple enough that I wrote the parser by hand, so I don't know how good either CUP or BYACC/J is.



回答7:

I've used SableCC in my compiler course, though not by choice.

I remember finding it very bulky and heavyweight, with more emphasis on cleanliness than convenience (no operator precedence or anything; you have to state that in the grammar).

I'd probably want to use something else if I had the choice. My experiences with yacc (for C) and happy (for Haskell) have both been pleasant.



回答8:

Parser combinators is a good choice. Popular Java implementation is JParsec.



回答9:

If you're going to go hardcore, throw in a bit of http://llvm.org in the mix :)



回答10:

I suggest you look at at the source for BeanShell. It has a compiler for Java and is fairly simple to read.



回答11:

http://java-source.net/open-source/parser-generators and http://catalog.compilertools.net/java.html contain catalogs of tools for this. Compare also the Stackoverflow question Alternatives to Regular Expressions.



回答12:

Use a parser combinator, like JParsec. There's a good video tutorial on how to use it.