Is there a more modern, maybe object-oriented, equivalent to Jack Crenshaw's "Let's Build a Compiler" series?
A while back I stumbled across "Let's Build a Compiler" and could just not resist writing some code. I wrote a recursive-descent C compiler in C# that output .NET CIL. "Write once, leak everywhere" was my slogan.
Too bad I did not realize until too late that parsing C is a nightmare.
I am now interested in writing a Java compiler in Java that outputs .NET CIL or assemblies with the goal of being self-bootstrapping. I was hoping there might some newer tutorials kicking around.
As an aside, would you spend more time with up-front design or would you just write a ton of tests to support the ability to mercilessly refactor. Thinking back, I am leaning towards the latter. The compiler worked but the code was really awful.
When thinking of learning this stuff, you should have a look at book language-implementation-patterns and antlr-reference
Take a look at Terence Parr's "Language Implementation Patterns". He wrote ANTLR - a parser generator for Java - so knows his stuff. It explains the principles of compiler design really well and builds up gradually.
Martin Fowler's "Domain Specific Languages" is also good. It has a slightly different agenda than being a pure compilers course, but is a good reference on the key concepts of language design.
It sounds like you completely missed the point of Crenshaw's tutorials. LBC isn't about writing pretty, clean, or efficient code. It's all about bringing something that's steeped in formal theory down to a level where the casual coder can easily and rapidly hack out a rudimentary (but working!) compiler.
When I read through LBC years back, I rewrote the examples in C#. I'm sure the class layout isn't the best, or tasks segregated properly, but it's comparable to his Pascal. I'd be happy to share the code with you if you like-- let me know and I can post it online and share the link.
In my spare time I've been hacking out some writing with the aim of unifying the philosophies of LBC and Basics of Compiler Design together-- walkling away with practical, working code at the end of each unit/chapter, with also discuss some theoretical stuff after exploring the ideas so the reader understands why things are the way they are. But it took Crenshaw years to write his incomplete series, so mine my be a pipe dream... and I use C (exactly because it's not C++ or Java).
I have recently built a compiler at my company using BNFC, at first I was instructed to use Flex and Bison (C/C++) but I found them to be a pain so I used BNFC to generate the Flex and Bison files.
Can't say I liked the code, my grammar was pretty big and so was the generated visitor but nothing I couldn't handle, I TDDed from the beginning so I always had enough tests to refactor and but I also kept a UML diagram to help me think about the additional classes I wrote.
There actually is a book called Implementing Programming Languages self described as "a self-study book, and to some extent, a manual to the BNFC tool" had I read it I would probably have struggled less with implementation decisions but overall I found BNFC to be intuitive enough to be able to use it by only reading the manual and the tutorial
Last but not least, it can also be used with other languages including Java (with Cup and JLex)
Have you taken a look at the PyPy project? It is a Python implementation of the Python language. Maybe it can provide some inspiration for your goal of self-bootstrapping Java?
If you like to learn by example, the code for Finch, a little programming language of mine: