Best practices for writing a programming language

2019-01-22 18:27发布

Are there any best practices that I should follow while writing a parser?

7条回答
可以哭但决不认输i
2楼-- · 2019-01-22 18:42

Don't overuse regular expressions - while they have their place, they simply don't have the power to handle any kind of real parsing. You can push them, but you're eventually going to hit a wall or end up with an unmaintainable mess. You're better off finding a parser generator that can handle a larger language set. If you really don't want to get into tools, you can look at recursive descent parsers - it's a really simple pattern for hand-writing a small parser. They aren't as flexible or as powerful as the big parser generators, but they have a much shorter learning curve.

Unless you have very tight performance requirements, try and keep your layers separate - the lexer reads in individual tokens, the parser arranges those into a tree, and then semantic analysis checks over everything and links up references, and then a final phase to output whatever is being produced. Keeping the different parts of logic separate will make things easier to maintain later.

查看更多
乱世女痞
3楼-- · 2019-01-22 18:55

Read most of the Dragon book first.

Parsers are not complicated if you know how to build them, but they are NOT the type of thing that if you put in enough time, you'll eventually get there. It's way better to build on the existing knowledge base. (Otherwise expect to write it and throw it away a few dozen times).

查看更多
再贱就再见
4楼-- · 2019-01-22 18:57

First, don't try to apply the same techniques to parsing everything. There are numerous possible use cases, from something like IP addresses (a bit of ad hoc code) to C++ programs (which need an industrial-strength parser with feedback from the symbol table), and from user input (which needs to be processed very fast) to compilers (which normally can afford to spend a little time parsing). You might want to specify what you're doing if you want useful answers.

Second, have a grammar in mind to parse with. The more complicated it is, the more formal the specification needs to be. Try to err on the side of being too formal.

Third, well, that depends on what you're doing.

查看更多
手持菜刀,她持情操
5楼-- · 2019-01-22 18:59
  • Choose the right kind of parser, sometimes a Recursive Descendant will be enough, sometimes you should use an LR parser (also, there are many types of LR parsers).
  • If you have a complex grammar, build an Abstract Syntax Tree.
  • Try to identify very well what goes into the lexer, what is part of the syntax and what is a matter of semantics.
  • Try to make the parser the least coupled to the lexer implementation as possible.
  • Provide a good interface to the user so he is agnostic of the parser implementation.
查看更多
三岁会撩人
6楼-- · 2019-01-22 19:01

The received wisdom is to use parser generators + grammars and it seems like good advice, because you are using a rigorous tool and presumably reducing effort and potential for bugs in doing so.

To use a parser generator the grammar has to be context free. If you are designing the languauge to be parsed then you can control this. If you are not sure then it could cost you a lot of effort if you start down the grammar route. Even if it is context free in practice, unless the grammar is enormous, it can be simpler to hand code a recursive decent parser.

Being context free does not only make the parser generator possible, but it also makes hand coded parsers a lot simpler. What you end up with is one (or two) functions per phrase. Which is if you organise and name the code cleanly is not much harder to see than a grammar (if your IDE can show you call hierachies then you can pretty much see what the grammar is).

The advantages:-

  • Simpler build
  • Better performance
  • Better control of output
  • Can cope with small deviations, e.g. work with a grammar that is not 100% context free

I am not saying grammars are always unsuitable, but often the benefits are minimal and are often out weighed by the costs and risks.

(I believe the arguments for them are speciously appealing and that there is a general bias for them as it is a way of signaling that one is more computer-science literate.)

查看更多
Luminary・发光体
7楼-- · 2019-01-22 19:02

Few pieces of advice:

  • Know your grammar - write it down in a suitable form
  • Choose the right tool. Do it from within C++ with Spirit2x, or choose external parser tools like antlr, yacc, or whatever suits you
  • Do you need a parser? Maybe regexp will suffice? Or maybe hack a perl script to do the trick? Writing complex parsers take time.
查看更多
登录 后发表回答