Does ICC satisfy C99 specs for multiplication of c

2020-07-02 06:37发布

问题:

Consider this simple code:

#include <complex.h>
complex float f(complex float x) {
  return x*x;
}

If you compile it with -O3 -march=core-avx2 -fp-model strict using the Intel Compiler you get:

f:
        vmovsldup xmm1, xmm0                                    #3.12
        vmovshdup xmm2, xmm0                                    #3.12
        vshufps   xmm3, xmm0, xmm0, 177                         #3.12
        vmulps    xmm4, xmm1, xmm0                              #3.12
        vmulps    xmm5, xmm2, xmm3                              #3.12
        vaddsubps xmm0, xmm4, xmm5                              #3.12
        ret 

This is much simpler code than you get from both gcc and clang and also much simpler than the code you will find online for multiplying complex numbers. It doesn't, for example appear explicitly to deal with complex NaN or infinities.

Does this assembly meet the specs for C99 complex multiplication?

回答1:

The code is non-conforming.

Annex G, Section 5.1, Paragraph 4 reads

The * and / operators satisfy the following infinity properties for all real, imaginary, and complex operands:

— if one operand is an infinity and the other operand is a nonzero finite number or an infinity, then the result of the * operator is an infinity;

So if z = a * ib is infinite and w = c * id is infinite, the number z * w must be infinite.

The same annex, Section 3, Paragraph 1 defines what it means for a complex number to be infinite:

A complex or imaginary value with at least one infinite part is regarded as an infinity (even if its other part is a NaN).

So z is infinite if either a or b are.
This is indeed a sensible choice as it reflects the mathematical framework1.

However if we let z = ∞ + i∞ (an infinite value) and w = i∞ (and infinite value) the result for the Intel code is z * w = NaN + iNaN due to the ∞ · 0 intermediates2.

This suffices to label it as non-conforming.


We can further confirm this by taking a look at the footnote on the first quote (the footnote was not reported here), it mentions the CX_LIMITED_RANGE pragma directive.

Section 7.3.4, Paragraph 1 reads

The usual mathematical formulas for complex multiply, divide, and absolute value are problematic because of their treatment of infinities and because of undue overflow and underflow. The CX_LIMITED_RANGE pragma can be used to inform the implementation that (where the state is ‘‘on’’) the usual mathematical formulas [that produces NaNs] are acceptable.

Here the standard committee is trying to alleviate the huge mole of work for the complex multiplication (and division).
In fact GCC has a flag to control this behaviour:

-fcx-limited-range
When enabled, this option states that a range reduction step is not needed when performing complex division.

Also, there is no checking whether the result of a complex multiplication or division is NaN + I*NaN, with an attempt to rescue the situation in that case.

The default is -fno-cx-limited-range, but is enabled by -ffast-math.
This option controls the default setting of the ISO C99 CX_LIMITED_RANGE pragma.

It this option alone that makes GCC generate slow code and additional checks, without it the code it generate has the same flaws of Intel's one (I translated the source to C++)

f(std::complex<float>):
        movq    QWORD PTR [rsp-8], xmm0
        movss   xmm0, DWORD PTR [rsp-8]
        movss   xmm2, DWORD PTR [rsp-4]
        movaps  xmm1, xmm0
        movaps  xmm3, xmm2
        mulss   xmm1, xmm0
        mulss   xmm3, xmm2
        mulss   xmm0, xmm2
        subss   xmm1, xmm3
        addss   xmm0, xmm0
        movss   DWORD PTR [rsp-16], xmm1
        movss   DWORD PTR [rsp-12], xmm0
        movq    xmm0, QWORD PTR [rsp-16]
        ret

Without it the code is

f(std::complex<float>):
        sub     rsp, 40
        movq    QWORD PTR [rsp+24], xmm0
        movss   xmm3, DWORD PTR [rsp+28]
        movss   xmm2, DWORD PTR [rsp+24]
        movaps  xmm1, xmm3
        movaps  xmm0, xmm2
        call    __mulsc3
        movq    QWORD PTR [rsp+16], xmm0
        movss   xmm0, DWORD PTR [rsp+16]
        movss   DWORD PTR [rsp+8], xmm0
        movss   xmm0, DWORD PTR [rsp+20]
        movss   DWORD PTR [rsp+12], xmm0
        movq    xmm0, QWORD PTR [rsp+8]
        add     rsp, 40
        ret

and the __mulsc3 function is practically the same the standard C99 recommends for complex multiplication.
It includes the above mentioned checks.


1 Where the modulus of a number is extended from the real case |z| to the complex one ‖z‖, keeping the definition of infinite as the result of unbounded limits. Simply put, in the complex plane there is a whole circumference of infinite values and it takes just one "coordinate" to be infinite to get an infinite modulus.

2 The situation get worst if we remember that z = NaN + i∞ or z = ∞ + iNaN are valid infinite values



回答2:

I get similar, but not identical, code from clang 3.8 at -O2 -march=core-avx2 -ffast-math: I'm not deeply familiar with latter-day x86 floating point features, but I think it's doing the same calculation but using different instructions to shuffle values around in registers.

f:
        vmovshdup       %xmm0, %xmm1    # xmm1 = xmm0[1,1,3,3]
        vaddss  %xmm0, %xmm0, %xmm2
        vmulss  %xmm2, %xmm1, %xmm2
        vmulss  %xmm1, %xmm1, %xmm1
        vfmsub231ss     %xmm0, %xmm0, %xmm1
        vinsertps       $16, %xmm2, %xmm1, %xmm0 # xmm0 = xmm1[0],xmm2[0],xmm1[2,3]
        retq

GCC 6.3, with the same options, again appears to do the same calculation but shuffle values around in yet a third way:

f:
        vmovq   %xmm0, -8(%rsp)
        vmovss  -4(%rsp), %xmm2
        vmovss  -8(%rsp), %xmm0
        vmulss  %xmm2, %xmm2, %xmm1
        vfmsub231ss     %xmm0, %xmm0, %xmm1
        vmulss  %xmm2, %xmm0, %xmm0
        vmovss  %xmm1, -16(%rsp)
        vaddss  %xmm0, %xmm0, %xmm0
        vmovss  %xmm0, -12(%rsp)
        vmovq   -16(%rsp), %xmm0
        ret

Without -ffast-math, both compilers generate substantially different code that does appear to check for NaN, at least.

I conclude from this that Intel's compiler is not generating a fully IEEE-compliant complex multiplication even with -fp-model strict. There may be some other command-line switch that makes it generate fully IEEE-compliant code.

Whether or not this qualifies as a violation of C99 depends on whether Intel's compiler is documented to conform to Annex F and G (which specify what it means for an implementation of C to provide IEEE-compliant real and complex arithmetic), and if it is, what command-line options you have to give to get the conforming mode.