My day job includes working to develop a Pascal-like compiler. I've been working all along on optimizations and code generation.
I would also like to start learning to build a simple parser for the same language. I'm however, not really sure how to go about this. Flex and Bison seem to be the choice. But, isn't it possible to write a parser using C++ or C#? I'm a bit creepy with C.
Yacc++ supports C#, but it's a licensed one. I'm looking for all the help that I can find in this regard. Suggestions would be highly appreciated.
Personally, I roll my own lexer and parser (LL). Here's a very-abbreviated example. It is in C++, but hopefully you can adapt it. It makes use of a macro PARSE_HIGHER to make it easy to insert operators at different precedence levels without much code changing.
Added some Pascal-style statement syntax:
It still needs syntax for array indexing, variable declaration, and function definition, but I hope it is clear how to do that.
If you were writing it in Java I'd recommend ANTLR. It's a nice LL(*) parser-generator written in Java. There's a terrific book for it on Amazon, too.
You can actually use flex & bison with C++. In this tutorial, for example, you can see that section 5 is dedicated to that matter. Just google for it, and I'm sure you will find lots of examples.
When you use Lex and Yacc you don't actually write much of anything in C. Lex is its own language, as is Yacc. So you write the lexical analyzer in Lex and the parser in Yacc. However, for Pascal, Lex and Yacc inputs are already available.
The resulting parser and lexer have C interfaces, that is true. However most languages, including C++, have simple ways to call (or wrap) C interfaces.
I'm not an expert in it, but I'm sure all of the above goes for ANTLR as well.
If you are asking to do it in "pure C++" (whatever that means), look into using boost spirit. I don't really see the point in theoretical purity if it causes a ton more work though.
Writing your own lexers and parsers by hand is actually kinda fun. A lexer is one of the very few situations where you can justify using both gotos and the preprocessor. However, I wouldn't suggest it for a full-blown language like Pascal if you can avoid it. That would be a lot of work. I'm talking man-years.
I believe you can use ANTLR with C#. I've never tried it myself (yet), however there is a tutorial here that might point you in the right direction.
bison & flex are the canonical parser generators. If you're interested in C++, I've found boost spirit useful. I've never used it for anything as complex as a compiler though. I'm sure others will have interesting suggestions for other languages such as C#...