We use the Shunting-Yard algorithm to evaluate expressions. We can validate the expression by simply applying the algorithm. It fails if there are missing operands, miss-matched parenthesis, and other things. The Shunting-Yard algorithm however has a larger supported syntax than just human readable infix. For example,
1 + 2
+ 1 2
1 2 +
are all acceptable ways to provide '1+2' as input to the Shunting-Yard algorithm. '+ 1 2' and '1 2 +' are not valid infix, but the standard Shunting-Yard algorithm can handle them. The algorithm does not really care about the order, it applies operators by order of precedence grabbing the 'nearest' operands.
We would like to restrict our input to valid human readable infix. I am looking for a way to either modify the Shunting-Yard algorithm to fail with non-valid infix or provide an infix validation prior to using Shunting-Yard.
Is anyone aware of any published techniques to do this? We must support both basic operator, custom operators, brackets, and functions (with multiple arguments). I haven't seen anything that works with more than the basic operators online.
Thanks
The solution to my problem was to enhance the algorithm posted on Wikipedia with the state machine recommended by Rici. I am posting the pseudo code here because it may be of use to others.
You can easily differentiate between unary and binary operators (I'm specifically speaking about the negative prefix and subtraction operator) by looking at the previous token. If there is no previous token, the previous token is an open parenthesis, or the previous token is an operator then you have encountered a unary prefix operator, else you have encountered the binary operator.
A nice discussion on Shunting Yard algorithms is http://www.engr.mun.ca/~theo/Misc/exp_parsing.htm The algorithm presented there uses the key idea of the operator stack but has some grammar to know what should be expected next. It has two main functions
E()
which expects an expression andP()
which is expecting either a prefix operator, a variable, a number, brackets and functions. Prefix operators always bind tighter than binary operators so you want to deal this any of then first.If we say P stands for some prefix sequence and B is a binary operator then any expression will be of the form
i.e. you are either expecting a a prefix sequence or a binary operator. Formally the grammar is
and P will be
This translates to psudocode as
You can expand this to handle other unary operators, functions, brackets, suffix operators.