gcc, simd intrinsics and fast-math concepts

2019-03-15 11:07发布

问题:

Hi all :)
I'm trying to get a hang on a few concepts regarding floating point, SIMD/math intrinsics and the fast-math flag for gcc. More specifically, I'm using MinGW with gcc v4.5.0 on a x86 cpu.

I've searched around for a while now, and that's what I (think I) understand at the moment:

When I compile with no flags, any fp code will be standard x87, no simd intrinsics, and the math.h functions will be linked from msvcrt.dll.

When I use mfpmath, mssen and/or march so that mmx/sse/avx code gets enabled, gcc actually uses simd instructions only if I also specify some optimization flags, like On or ftree-vectorize. In which case the intrinsics are chosen automagically by gcc, and some math functions (I'm still talking about the standard math funcs on math.h) will become intrinsics or optimized out by inline code, some others will still come from the msvcrt.dll. If I don't specify optimization flags, does any of this change?

When I use specific simd data types (those available as gcc extensions, like v4si or v8qi), I have the option to call intrinsic funcs directly, or again leave the automagic decision to gcc. Gcc can still chose standard x87 code if I don't enable simd instructions via the proper flags. Again, if I don't specify optimization flags, does any of this change?

Plese correct me if any of my statements is wrong :p

Now the questions:

  1. Do I ever have to include x86intrin.h to use intrinsics?
  2. Do I ever have to link the libm?
  3. What fast-math has to do with anything? I understand it relaxes the IEEE standard, but, specifically, how? Other standard functions are used? Some other lib is linked? Or are just a couple of flags set somewhere and the standard lib behaves differently?

Thanks to anybody who is going to help :D

回答1:

Ok, I'm ansewring for anyone who is struggling a bit to grasp these concepts like me.

Optimizations with Ox work on any kind of code, fpu or sse

fast-math seems to work only on x87 code. Also, it doesn't seem to change the fpu control word o_O

Builtins are always included. This behavior can be avoided for some builtins, with some flags, like strict or no-builtins.

The libm.a is used for some stuff that is not included in the glibc, but with mingw it's just a dummy file, so at the moment it's useless to link to it

Using the special vector types of gcc seems useful only when calling the intrinsics directly, otherwise the code gets vectorized anyway.

Any correction is welcomed :)

Useful links:
fpu / sse control
gcc math
and the gcc manual on "Vector Extensions", "X86 Built-in functions" and "Other Builtins"