I'm writing a tool with it's own built-in language similar to Python. I want to make indentation meaningful in the syntax (so that tabs and spaces at line beginning would represent nesting of commands).
What is the best way to do this?
I've written recursive-descent and finite automata parsers before.
Check out the python compiler and in particular
compiler.parse
.I'd suggest ANTLR for any lexer/parser generation ( http://www.antlr.org ).
Also, this website ( http://erezsh.wordpress.com/2008/07/12/python-parsing-1-lexing/ ) has some more information, in particular:
The current CPython's parser seems to be generated using something called ASDL.
Regarding the indentation you're asking for, it's done using special lexer tokens called
INDENT
andDEDENT
. To replicate that, just implement those tokens in your lexer (that is pretty easy if you use a stack to store the starting columns of previous indented lines), and then plug them into your grammar as usual (like any other keyword or operator token).