I am working on a small text editor project and want to add basic syntax highlighting for a couple of languages (Java, XML..just to name a few). As a learning experience I wanted to add one of the popular or non popular Java lexer parser.
What project do you recommend. Antlr is probably the most well known, but it seems pretty complex and heavy.
Here are the option that I know of.
- Antlr
- Ragel (yes, it can generate Java source for processing input)
- Do it yourself (I guess I could write a simple token parser and highlight the source code).
ANTLR may seem complex and heavy but you don't need to use all of the functionality that it includes; it's nicely layered. I'm a big fan of using it to develop parsers. For starters, you can use the excellent ANTLRWorks to visualize and test the grammars that you are creating. It's really nice to be able to watch it capture tokens, build parse trees and step through the process.
For your text editor project, I would check out filter grammars, which might suit your needs nicely. For filter grammars you don't need to specify the entire lexical structure of your language, only the parts that you care about (i.e. need to highlight, color or index) and you can always add in more until you can handle a whole language.
ANTLR is the way to go. I would not build it by hand. You'll also find if you look around on the ANTLR web site that grammars are available for Java, XML, etc.
ANTLR or JavaCC would be the two I know. I'd recommend ANTLR first.
Google code has new project acacia-lex. Written by myself, it seems simple (so far) java lexer using javax annotations.
JLex and CUP are decent lexer and parser generators, respectively. I'm currently using both to develop a simple scripting language for a project I'm working on.
I've done it with JFlex before and was quite satisfied with it. But the language I was highlighting was simple enough that I didn't need a parser generator, so your mileage may vary.