How would I code a complex formula parser manually

2019-04-27 02:19发布

Hm, this is language - agnostic, I would prefer doing it in C# or F#, but I'm more interested this time in the question "how would that work anyway".

What I want to accomplish ist:

a) I want to LEARN it - it's about my ego this time, it's for a fun project where I want to show myself that I'm a really good at this stuff

b) I know a tiny little bit about EBNF (although I don't know yet, how operator precedence works in EBNF - Irony.NET does it right, I checked the examples, but this is a bit ominous to me)

c) My parser should be able to take this: 5 * (3 + (2 - 9 * (5 / 7)) + 9) for example and give me the right results

d) To be quite frankly, this seems to be the biggest problem in writing a compiler or even an interpreter for me. I would have no problem generating even 64 bit assembler code (I CAN write assembler manually), but the formula parser...

e) Another thought: even simple computers (like my old Sharp 1246S with only about 2kB of RAM) can do that... it can't be THAT hard, right? And even very, very old programming languages have formula evaluation... BASIC is from 1964 and they already could calculate the kind of formula I presented as an example

f) A few ideas, a few inspirations would be really enough - I just have no clue how to do operator precedence and the parentheses - I DO, however, know that it involves an AST and that many people use a stack

So, what do you think?

4条回答
你好瞎i
2楼-- · 2019-04-27 02:35

If you want to go for an existing solution I can recommend a working, PSR-0 compatible implementation of the shunting yard algorithm: https://github.com/andig/php-shunting-yard/tree/dev.

查看更多
趁早两清
3楼-- · 2019-04-27 02:43

For a stack-based parser implemented in PHP that uses Djikstra's shunting yard algorithm to convert infix to postfix notation, and with support for functions with varying number of arguments, you can look at the source for the PHPExcel calculation engine

查看更多
欢心
4楼-- · 2019-04-27 02:50

Traditionally formula processors on computers use POSTFIX notation. They use a stack, pop 2 items as operands, pop the third item as the operator, and push the result.

What you want is an INFIX to POSTFIX notation converter which is really quite simple. Once you're in postfix processing is the simplest thing you'll ever do.

查看更多
Viruses.
5楼-- · 2019-04-27 02:54

You should go learn about Recursive Descent parsers.

Check out a Code Golf exercise in doing just this, 10 different ways:

Code Golf: Mathematical expression evaluator (that respects PEMDAS)

Several of these "golf" solutions are recursive descent parsers just coded in different ways.

You'll find that doing just expression parsing is by far the easiest thing in a compiler. Parsing the rest of the language is harder, but understanding how the code elements interact and how to generate good code is far more difficult.

You may also be interested in how to express a parser using BNF, and how to do something with that BNF. Here's an example of how to parse and manipulate algebra symbolically with an explicit BNF and an implicit AST as a foundation. This isn't what compilers traditionally do, but the machinery that does is founded deeply in compiler technology.

查看更多
登录 后发表回答