We've used ANTLR to create a parser for a SQL-like grammar, and while the results are satisfactory in most cases, there are a few edge cases that we need to fix; and since we didn't write the parser ourselves we don't really understand it well enough to be able to make sensible changes.
So, we'd like to write our own parser. What's the best way to go about writing a parser by hand? What sort of parser should we use - recursive descent has been recommended; is that right? We'll be writing it in C#, so any tutorials for writing parsers in that language would be gratefully received.
UPDATE: I'd also be interested in answers that involve F# - I've been looking for a reason to use that in a project.
The only kind of parser that can be handwritten by a sane human being is a recursive-descent. That said, writing bottom-up parser by hand is still possible but is very undesirable.
If you're up for RD parser you have to verify that SQL grammar is not left-recursive (and eliminate the recursion if necessary), and then basically write a function for each grammar rule. See this for further reference.
At the risk of offending the OP, writing a parser for a large langauge like some specific vendor's SQL by hand when good parser generator tools (such as ANTLR) are available is simply crazy. You'll spend far more time rewriting your parser by hand than you will fixing the "edge cases" using the parser generator, and you'll invariably have to go back and revise the parser anyway as the SQL standards move or you discover you misunderstood something else. If you don't understand your parsing technology well enough, invest the time to understand it. It won't take you months to figure out how to deal with the edge cases with the parser generator, and you've already admitted it you are willing to spend months doing it by hand.
Having said that, if you are hell-bent on redoing it manually, using a recursive descent parser is the best way to do it by hand.
(NOTE: OP clarified below that he wants a reference implementation. IMHO you should NEVER code a reference implementation of a grammar procedurally, so recursive descent is out.)
If I were you I would have another go at ANTLRv3 using the GUI ANTLRWorks which gives you a very convenient way of testing your grammar. We use ANTLR in our project and although the learning curve may be a bit steep in the beginning once you learn it is quite convenient. Also on their email newsletter there are a lot of people who are very helpful.
PS. IIRC they also have a SQL-grammar you could take a look at.
hth
Look at gplex and gppg, lexer and parser generators for .NET. They work well, and are based on the same (and almost compatible) input as lex and yacc, which is relatively easy to use.
I would also vote for using an existing parser + lexer.
The only reason I can see in doing it by hand:
Recursive Descent parsers are indeed the best, maybe only, parsers that can be built by hand. You will still have to bone-up on what exactly a formal, context-free language is and put your language in a normal form. I would personally suggest that you remove left-recursion and put your language in Greibach Normal Form. When you do that, the parser just about writes itself.
For example, this production:
becomes something simple like:
And there aren't any much more difficult cases here (what's more difficult is make sure your grammar is in exactly the right form but this allows you the control that you mentioned).
Anton's Link is also seems excellent.