I'd like to enable temporarily FTZ
/DAZ
modes to get a performance gain for some code where strict compliance with the IEEE 754 standard is not an issue, without changing the behaviour of other threads, which could be executing code, where that compliance is important.
I've been reading this on how to enable/disable these modes and this on the performance impact of denormals handling, but unfortunately I've got a mixed code in a multithreaded environment and I cannot enable these modes once and for all.
My understanding is that since MXCSR
register's flags determine the behavior of the hardware and since every thread has its own context of registers, setting these flags will only affect the behaviour of the current thread.
Is it correct?
Yes, MXCSR
is part of the per-thread architectural state saved/restored by context switches, along with the xmm/ymm/zmm and x87 stack registers (using xsave
/xrstor
). Different threads have their own FPU state.
Interesting idea, I'd always figured DAZ was only useful if you had denormal constants or something (or data from a file), but having other threads running without FTZ is another source of denormals.
You might also want to compile some files with -ffast-math
, or a subset of those options. Note that linking with -ffast-math
in gcc will include a CRT function that sets DAZ/FTZ before main()
, so don't do that.
The optimizations enabled by fast-math are mostly orthogonal to whether denormals are flushed to zero. Even just -fno-math-errno
lets more math functions inline (better / at-all), e.g. sqrtf
, and is totally safe if you don't care about errno
being set as well as getting a NaN result.