How to write a language with Python-like indentati

2019-06-21 17:22发布

I'm writing a tool with it's own built-in language similar to Python. I want to make indentation meaningful in the syntax (so that tabs and spaces at line beginning would represent nesting of commands).

What is the best way to do this?

I've written recursive-descent and finite automata parsers before.

标签: parsing
3条回答
Melony?
2楼-- · 2019-06-21 17:33

Check out the python compiler and in particular compiler.parse.

查看更多
相关推荐>>
3楼-- · 2019-06-21 17:38

I'd suggest ANTLR for any lexer/parser generation ( http://www.antlr.org ).

Also, this website ( http://erezsh.wordpress.com/2008/07/12/python-parsing-1-lexing/ ) has some more information, in particular:

Python’s indentation cannot be solved with a DFA. (I’m still perplexed at whether it can even be solved with a context-free grammar).

PyPy produced an interesting post about lexing Python (they intend to solve it using post-processing the lexer output)

CPython’s tokenizer is written in C. It’s ad-hoc, hand-written, and complex. It is the only official implementation of Python lexing that I know of.

查看更多
何必那么认真
4楼-- · 2019-06-21 17:52

The current CPython's parser seems to be generated using something called ASDL.

Regarding the indentation you're asking for, it's done using special lexer tokens called INDENT and DEDENT. To replicate that, just implement those tokens in your lexer (that is pretty easy if you use a stack to store the starting columns of previous indented lines), and then plug them into your grammar as usual (like any other keyword or operator token).

查看更多
登录 后发表回答