I'm in the process of converting a program to C++ from Scilab (similar to Matlab) and I'm required to maintain the same level of precision that is kept by the previous code.
Note: Although maintaining the same level of precision would be ideal. It's acceptable if there is some error with the finished result. The problem I'm facing (as I'll show below) is due to looping, so the calculation error compounds rather quickly. But if the final result is only a thousandth or so off (e.g. 1/1000 vs 1/1001) it won't be a problem.
I've briefly looked into a number of different ways to do this including:
Int vs Float Example: Instead of using the float 12.45, store it as an integer being 124,500. Then simply convert everything back when appropriate to do so. Note: I'm not exactly sure how this will work with the code I'm working with (more detail below).
An example of how my program is producing incorrect results:
for (int i = 0; i <= 1000; i++)
{
for (int j = 0; j <= 10000; j++)
{
// This calculation will be computed with less precision than in Scilab
float1 = (1.0 / 100000.0);
// The above error of float2 will become significant by the end of the loop
float2 = (float1 + float2);
}
}
My question is:
Is there a generally accepted way to go about retaining accuracy in floating point arithmetic OR will one of the above methods suffice?
Well, another alternative if you use GCC compilers is to go with quadmath/__float128 types.
If your compiler supports it use BCD (Binary-coded decimal)
Sam
"Generally accepted" is too broad, so no.
Yes. Particularly gmp seems to be a standard choice. I would also have a look at the Boost Multiprecision library.
A hand-coded integer approach can work as well, but is surely not the method of choice: it requires much more coding, and more severe a means to store and process aritrarily precise integers.
Maintaining precision when porting code like this is very difficult to do. Not because the languages have implicitly different perspectives on what a
float
is, but because of what the different algorithms or assumptions of accuracy limits are. For example, when performing numerical integration in Scilab, it may use a Gaussian quadrature method. Whereas you might try using a trapezoidal method. The two may both be working on identical IEEE754 single-precision floating point numbers, but you will get different answers due to the convergence characteristics of the two algorithms. So how do you get around this?Well, you can go through the Scilab source code and look at all of the algorithms it uses for each thing you need. You can then replicate these algorithms taking care of any pre- or post-conditioning of the data that Scilab implicitly does (if any at all). That's a lot of work. And, frankly, probably not the best way to spend your time. Rather, I would look into using the Interfacing with Other Languages section from the developer's documentation to see how you can call the Scilab functions directly from your C, C++, Java, or Fortran code.
Of course, with the second option, you have to consider how you are going to distribute your code (if you need to).Scilab has a GPL-compatible license, so you can just bundle it with your code. However, it is quite big (~180MB) and you may want to just bundle the pieces you need (e.g., you don't need the whole interpreter system). This is more work in a different way, but guarantees numerical-compatibility with your current Scilab solutions.