I have to evaluate a large number of expressions containing variables and I am thinking about writing a small custom interpreter to keep compilation fast and small. However I have no experience with this topic and have a few questions.
Say we have a file with mathematical expressions and a limited set of objects. The file could look like:
expr[x,y,z] = 2*x*y + x^2 + 28/14*z*(x*y^2 + 15*z) + ...
I'd like to parse this somehow so I can evaluate the expressions numerically in my application
by simply calling a function expr(float x, float y, float z)
. The number of parameters should not be fixed (EDIT -: every expression would have its own definition with the appropriate number of parameters or would accept an array) and nesting of parenthesis should be allowed to keep the input files reasonably small.
Since the expressions are all of polynomial type, I can think of how the data structure should look like, but parsing looks difficult. I have already found some answers to somewhat similar questions here on SO, for instance using Lua
.
The biggest question, however, is what the performance penalty would be when creating and calling those objects as compared to directly compile these expressions from automatically generated C code.
Thanks in advance!
EDIT -: Please consider the example of expr()
above only as such. I guess the best way would be to have objects of a template class that holds coefficients and powers of the variables in sparse arrays.
Performance is a bit of a length-of-a-piece-of-string issue. Interpreted languages pretty much always are slower than compiled C code to evaluate arithmetic expressions. But not that many programs spend the majority of their time doing arithmetic, so most of the time that doesn't matter. It also makes a difference whether you parse the expression every time you evaluate it or (as seems more likely from what you say), parse it into some intermediate form.
It's impossible to tell from what you've said, whether it will matter to you, or how fast an interpreter you will write, but I wouldn't expect it to be better than 10 times slower, as far as time spent evaluating the expressions is concerned. First attempts at interpretation have been far worse.
As for that intermediate form - the usual place to start is to use Dijkstra's "shunting-yard" algorithm to convert your infix expressions to a reverse Polish form. That gives you a sequence of "symbols", "byte codes", call them what you like, and it's easy to write an expression evaluator for that form - each operator just pops its operands from a stack, performs the op, then pushes the result onto the stack, until the final value of the expression is the only thing left at the end. Numeric literals and variable names are just like "operators" that pop no operands, and push their value.
[Edit - depending who your users are, it might be feasible for your program to take that text file, generate a C program from it, run a compiler and then run the resulting program (or, open and call into the resulting dll). Obviously that relies on a lot of system-specific stuff (a compiler being installed, for one), and the expressions would need to be evaluated enough times that the overhead of compilation is overcome.]
You stated the problem as "big complex expressions", and you are concerned about performance penalties. Then you should consider compiling them, not interpreting them.
(good interpreters are 10x slower than compiled code as a rule of thumb; lousy/ad hoc interpreters tend to be significantly worse).
The usual route for this is to "compile" the expressions somehow, which involves building parsers, code generators, optimizations, etc.
C compilers already do all this. So I think
you'd be far better off translating these expressions to C.
Compiling them is then easy and execution will be lightning fast compared to anything you can hope to do as interpreter. That can also be done
using a parser and much simpler syntax directed translation.
But If these expressions are all produced by Mathematica they'll have a pretty standard but not complicated structure. In this case I'd guess you could write a regexp-based translator that could map the Mathematica forms into C functions with not a lot of trouble; Perl would be ideal for this. This gives you an easy to implement, and very fast solution.
For what it is worth, I believe Mathematica has an option to convert Mathematica expressions directly into C. Seems that that would be worth checking out, too.
There is a simple example in the Bison Manual.