Workaround for this calculator parsing error

2019-09-19 19:37发布

Context :

I entered a expression 3.24 * 10^10 + 1 into a calculator that I made. My calculator's approach to solve this is - it first looks for pattern number_a^number_b, parses the 2 numbers into double using Double.parseDouble() method, then performs Math.pow(number_a, number_b) and replaces the expression with the result.

The calculator, then, similarly looks for pattern number_a * number_b and parses it. So far our expression becomes 3.24E10 + 1. Now comes the tricky part. When I programmed this calculator I did it under consideration that calculator should find the pattern number_a + number_b and parse it. My calculator indeed does this and returns the result as, unexpectedly but justifiably - 3.24E11.0.

I am looking for workaround to make my calculator smart enough to take care of such expressions.

Important information - Regex example = ([\\d\\.]+)\\*([\\d\\.]+)

Code example -

// here 'expression' is a StringBuilder type
// only a (modified) snippet of actual code.

Matcher m = Pattern.compile ("([\\d\\.]+)\\^([\\d\\.]+)")
                           .matcher (expression.toString());
while (m.find()) {
     Double d1 = Double.parseDouble(m.group(1));
     Double d2 = Double.parseDouble(m.group(2));
     Double d3 = Math.pow(d1, d2);
     expression.replace(m.start(), m.end(), Double.toString(d3));
     m.reset(expression);
}

PS : Many people seem to think, based on how I presented the question, that my calculator is a failed attempt as regex won't take me too far. Ofcourse, I agree that is true and there may exist far better algorithms. I just want to make clear that :-

1) Regex is only used for parsing expressions in direct form. I don't use regex for everything. Nested brackets are solved using recursion. Regex only comes to play at the last step when all the processing work has been done and what remains is only simple calculation.

2) My calculator works fine. It can and does solve nested expressions gracefully. Proof - 2^3*2/4+1 --> 5.0, sin(cos(1.57) + tan(cos(1.57)) + 1.57) --> 0.9999996829318346, ((3(2log(10))+1)+1)exp(0) --> 8.0

3) Does not use too many 'crutches'. If you are of an opinion that I have written thousands of line of code to obtain the desired functionality. No. 200 lines and that's it. And I have no intention of dumping my application (which is near completion).

2条回答
▲ chillily
2楼-- · 2019-09-19 20:06

According to your comment, by changing the regex from this:

([\\d\\.]+)\\*([\\d\\.]+)

to this works:

(\\d+(\\.\\d+)?(e\\d+)?)\\^(\\d+(\\.\\d+)?(e\\d+)?)

To explain what I've changed: Before, you were allowed to enter numbers in the format:

  1. 1
  2. .5
  3. .......
  4. .3.76
  5. and so on

To overcome this: I added an optional decimal place ((\\.\\d+)?), which allows integers, but also decimals.

Also by adding an optional scientific notation ( (e\\d+)?) on both sides, allows the numbers to be written:

  1. As integers (2 ^ 5)
  2. As decimals (2.3 ^ 5.7)
  3. And as scientific (2.345e2 ^ 5e10)

You can of course mix all variants up.

But keep in mind the comments below your question. Regex is for small bits maybe useful, but it can get pretty clumpy, slow and messed up, the bigger the equations get.

Also if you want to support negative numbers, you can add optional hyphens in front of the bases and the exponents:

(-?\\d+(\\.\\d+)?(e-?\\d+)?)\\^(-?\\d+(\\.\\d+)?(e-?\\d+)?)
查看更多
家丑人穷心不美
3楼-- · 2019-09-19 20:14

if you could provide me a justification for why the regex is not a good fit

  1. A true regular expression cannot properly parse nested / balanced brackets. (OK, it is possible to use advanced regex features to do it, but the result is hellishly difficult to understand1.)

  2. A true regular expression will have difficulty analyzing an expression with operators that have different precedence. Especially with brackets. (I'm not sure if it is impossible, but it is certainly difficult.)

  3. Once you have used your regex(es) to match the expression, you then have the problem of sorting out the "groups" that you have matched into something that allows you to (correctly) evaluate the expression.

  4. A regex cannot produce any explanation if the input is syntactically invalid.

  5. Complicated regexes are often pathologically expensive ... especially for large input strings that are incorrect.

what exactly do the other algorithms have that make them superior.

A properly written or generated lexer + parse will have none of the above problems. You can either evaluate the expression on the fly, or you can turn it into a parse tree that can be evaluated repeatedly; e.g. with different values for variables.

The shunting-yard algorithm (while of more limited application) also has none of the above problems.


This is about picking the right tool for the job. And also about recognizing that regexes are NOT the right tool for every job.


1 - If you want explore the rabbit warren of using regexes to parse nested structures, here is an entrance.

查看更多
登录 后发表回答