Minimum and maximum of signed zero

2019-02-12 20:03发布

问题:

I am concerned about the following cases

min(-0.0,0.0)
max(-0.0,0.0)
minmag(-x,x) 
maxmag(-x,x)

According to Wikipedia IEEE 754-2008 says in regards to min and max

The min and max operations are defined but leave some leeway for the case where the inputs are equal in value but differ in representation. In particular:

min(+0,−0) or min(−0,+0) must produce something with a value of zero but may always return the first argument.

I did some tests compare fmin, fmax, min and max as defined below

#define max(a,b) \
   ({ __typeof__ (a) _a = (a); \
       __typeof__ (b) _b = (b); \
     _a > _b ? _a : _b; })
#define min(a,b) \
   ({ __typeof__ (a) _a = (a); \
       __typeof__ (b) _b = (b); \
     _a < _b ? _a : _b; })

and _mm_min_ps and _mm_max_ps which call the SSE minps and maxps instruction.

Here are the results (the code I used to test this is posted below)

fmin(-0.0,0.0)       = -0.0
fmax(-0.0,0.0)       =  0.0
min(-0.0,0.0)        =  0.0
max(-0.0,0.0)        =  0.0
_mm_min_ps(-0.0,0.0) =  0.0
_mm_max_ps(-0.0,0.0) = -0.0

As you can see each case returns different results. So my main question is what does the C and C++ standard libraries say? Does fmin(-0.0,0.0) have to equal -0.0 and fmax(-0.0,0.0) have to equal 0.0 or are different implementations allowed to define it differently? If it's implementation defined does this mean that to insure the code is compatible with different implementation of the C standard library (.e.g from different compilers) that checks must be done to determine how they implement min and max?

What about minmag(-x,x) and maxmag(-x,x)? These are both defined in IEEE 754-2008. Are these implementation defined at least in IEEE 754-2008? I infer from Wikepdia's comment on min and max that these are implementation defined. But the C standard library does not define these functions as far as I know. In OpenCL these functions are defined as

maxmag Returns x if | x| > |y|, or y if |y| > |x|, otherwise fmax(x, y).

minmag Returns x if |x| < |y|, or y if |y| < |x|, otherwise fmin(x, y).

The x86 instruction set has no minmag and maxmag instructions so I had to implement them. But in my case I need performance and creating a branch for the case when the magnitudes are equal is not efficient.

The Itaninum instruction set has minmag and maxmag instructions (famin and famax) and in this case as far as I can tell (from reading) in this case it returns the second argument. That's not what minps and maxps appear to be doing though. It's strange that _mm_min_ps(-0.0,0.0) = 0.0 and _mm_max_ps(-0.0,0.0) = -0.0. I would have expected them to either return the first argument in both cases or the second. Why are the minps and maxps instructions defined this way?

#include <stdio.h>
#include <x86intrin.h>
#include <math.h>

#define max(a,b) \
   ({ __typeof__ (a) _a = (a); \
       __typeof__ (b) _b = (b); \
     _a > _b ? _a : _b; })

#define min(a,b) \
   ({ __typeof__ (a) _a = (a); \
       __typeof__ (b) _b = (b); \
     _a < _b ? _a : _b; })

int main(void) {
    float a[4] = {-0.0, -1.0, -2.0, -3.0};   
    float b[4] = {0.0, 1.0, 2.0, 3.0};
    __m128 a4 = _mm_load_ps(a);
    __m128 b4 = _mm_load_ps(b);
    __m128 c4 = _mm_min_ps(a4,b4);
    __m128 d4 = _mm_max_ps(a4,b4);
    { float c[4]; _mm_store_ps(c,c4); printf("%f %f %f %f\n", c[0], c[1], c[2], c[3]); }
    { float c[4]; _mm_store_ps(c,d4); printf("%f %f %f %f\n", c[0], c[1], c[2], c[3]); }

    printf("%f %f %f %f\n", fmin(a[0],b[0]), fmin(a[1],b[1]), fmin(a[2],b[2]), fmin(a[3],b[3]));
    printf("%f %f %f %f\n", fmax(a[0],b[0]), fmax(a[1],b[1]), fmax(a[2],b[2]), fmax(a[3],b[3]));

    printf("%f %f %f %f\n", min(a[0],b[0]), min(a[1],b[1]), min(a[2],b[2]), min(a[3],b[3]));
    printf("%f %f %f %f\n", max(a[0],b[0]), max(a[1],b[1]), max(a[2],b[2]), max(a[3],b[3]));    
}
//_mm_min_ps: 0.000000, -1.000000, -2.000000, -3.000000
//_mm_max_ps: -0.000000, 1.000000, 2.000000, 3.000000
//fmin: -0.000000, -1.000000, -2.000000, -3.000000
//fmax: 0.000000, 1.000000, 2.000000, 3.000000
//min: 0.000000, -1.000000, -2.000000, -3.000000
//max: 0.000000, 1.000000, 2.000000, 3.000000

Edit:

In regards to C++ I tested std::min(-0.0,0.0) and std::max(-0.0,0.0) and the both return -0.0. Which shows that that std::min is not the same as fmin and std::max is not the same as fmax.

回答1:

Why not read the standard yourself? The Wikipedia article for IEEE contains links to the standard.

Note: The C standard document is not available freely. But the final draft is (that's what I linked, search to find the pdf version). However, I've not seen the final document being cited here and AFAIK there had mostly been some typos corrected; nothing changed. IEEE is, however, available for free.

Note that a compiler need not stick to the standards (some embedded compilers/versions for instance do not implement IEEE-conforming floating point values, but are still C-conforming - just read the standard for details). So see the compiler documentation to see the compatibility. MS-VC for instance is not even compatible to C99 (and will never ben), while gcc and clang/llvm are (mostly) compatible to C11 in the current versions (gcc since 4.9.2 at least, in parts since 4.7).

In general, when using MS-VC, check if it actually does support that all standard features used. It is actually not fully compliant to the current standard, nor C99.



回答2:

The fundamental issue in this case is the actual underlying mathematics, ignoring representational issues. There are several implications in your question that I believe are erroneous. -0.0 < 0.0 is false. -0.0 is a negative number is false. 0.0 is a positive number is false. In fact, there's no such thing as -0.0, though there is an IEEE 754 representation of zero with a sign bit set.

In addition, the behavior of min/max functions is only a small slice of legal floating-point operations that can yield zeros with different sign bits. Since floating point units are free to return (-)0.0 for expressions like -7 - -7, you'd also have to figure out what to do with that. I'd also like to point out that |0.0| could in fact return 0.0 with the sign bit set, since -0.0 is an absolute value of 0.0. Put simply, as far as mathematics is concerned 0.0 is -0.0. They are the same thing.

The only way that you can test for 0.0 with a set sign bit is to abandon mathematical expressions and instead examine the binary representation of such values. But what's the point of that? There's only one legitimate case I can think of: the generation of binary data from two different machines that are required to be bit-for-bit identical. In this case, you'll need to also worry about signaling and quiet NaN values, since there are very many more aliases of these values (10^22-1 SNaN's and 10^22 QNaN's for single-precision floats, and about 10^51 values of each for double-precision).

In these situations where binary representation is critical (it's absolutely NOT for mathematical computation), then you'll have to write code to condition all floats on write (zeros, quiet NaN's, and signaling NaN's).

For any computational purpose, it's useless to worry about whether the sign bit is set or clear when the value is zero.