Should fxam work for single precision floating poi

This question arose from Why is isnormal() saying a value is normal when it isn't?

A C compiler generates the following code which is supposed to detect if the 32-bit float passed in is Normal or not:

    flds    24(%esp)
    fxam; fstsw %ax;
    andw    $17664, %ax
    cmpw    $1024, %ax
    sete    %al

(full code can be viewed here).

Is this code correct? The program appears to behave incorrectly, saying a number is Normal when it isn't. We think that perhaps the number is being checked for double-precision normality here.

I checked Intel's insn reference manual, as linked from https://stackoverflow.com/tags/x86/info.

There is only one version of the fxam instruction, and it operates on 80bit registers. So yes, that (inefficiently) tests the 80bit temporary for normality. (more efficient would be test $1024, %eax, rather than masking and then cmp.)

According to this, flds itself will raise a Denormal exception. I think this means it's testing the actual source, not the result of conversion to 80bit. That page says the denormal exception will set bits in the status word.

Intel's ref manual doesn't say anything about fld setting the status word, just stuff about setting the C1 flag, and leaving C0, C2, and C3 undefined. It does say you can get a #D FPU exception if the source is denormal, but that this won't happen if the source is in 80bit format.

I don't know if the status word will actually get set for denormals if FPU exceptions aren't enabled. I'm not an expert on this. My reading of this page (and the control-word section) is that the FPU status word is updated after most instructions. If the D bit is set in the control register (which it is by default), then denormal operands set the D bit in the status word. It it was unset (unmasked), an exception would happen.

So I think a function to test a float for denormal would look like:

isdenormalf:
    flds (%rdi)   # sets FPU status based on the input to the 32->80bit conversion
    fstsw %ax
    fstp %st0     # pop
    test $2, %al  # avoid 16 bit ops (%ax), they're slow on Intel
    sete %al   #  or just branch on flags directly if your compiler's smart
    ret

I haven't tried this, so it may be completely bogus. Writing this in a way that inlines without load/popping data that we want to keep loaded may be non-trivial. Maybe take an address arg, return a float (so it can be in an x87 register), and have an output arg with the condition.

I don't see an instruction that can check a float in an SSE register for denormal.

I think I have a (slow) way to test for denormals with SSE4.1 or AVX's ROUNDSS. You need to use a different version depending on the sign of the input.

For positive values:

Round towards +inf with denormals-are-zero
Round towards +inf without denormals-are-zero.
If the two rounding results are different, then denormals-are-zero had an effect (meaning the input was denormal)

Negative numbers need to be rounded towards -inf, not +inf, otherwise -0.xx will always round to zero. So this would have a branch, two ROUNDSSes, and a compare. Bit-hacks on the IEEE floating point format would probably be faster.