Separate a double into it's sign, exponent and

2019-09-08 10:20发布

I've read a few topics that do already broken down doubles and "puts it together" but I am trying to break it into it's base components. So far I have the bit nailed down:

breakDouble( double d ){

    long L = *(long*) &d;

    sign;
    long mask = 0x8000000000000000L;

    if( (L & mask) == mask ){

        sign = 1;

    } else {

        fps.sign = 0;
    }
    ...
}

But I'm pretty stumped as to how to get the exponent and the mantissa. I got away with forcing the double into a long because only the leading bit mattered so truncation didn't play a role. However, with the other parts I don't think that will work and I know you can't do bitwise operators on floats so I'm stuck.

Thoughts?


edit: of course as soon as I post this I find this, but I'm not sure how different floats and doubles are in this case.


Edit 2(sorry working as I go): I read that post I linked in edit 1 and it seems to me that I can perform the operations they are doing on my double the same way, with masks for the exponent being:

mask = 0x7FF0000000000000L;

and for the mantissa:

mask = 0xFFFFFFFFFFFFFL;

Is this correct?

标签: c double
1条回答
Lonely孤独者°
2楼-- · 2019-09-08 11:10

The bit masks you posted in your second edit look right. However, you should be aware that:

  1. Dereferencing (long *)&mydouble as you do is a violation of C's aliasing rules. This still flies under most compilers if you pass a flag like gcc's -fno-strict-aliasing, but it can lead to problems if you don't. You can cast to char * and look at the bits that way. It's more annoying and you have to worry about endianness, but you don't run the risk of compilers screwing everything up. You can also create a union type like the one at the bottom of the post and write into the d member while reading from the other three.

  2. Minor portability note: long isn't the same size everywhere; maybe try using a uint64_t instead? (double isn't either, but it's fairly clear that this is intended to apply only to IEEE doubles.)

  3. The trickery with bit-masks only works for so-called "normal" floating-point numbers --- those with a biased exponent that is neither zero (indicating subnormal) or 2047 (indicating infinity or NaN).

  4. As Raymond Chen points out, the frexp function does what you actually probably want. frexp handles the subnormal, infinity, and NaN cases in a documented and sane way, but you pay a speed hit for using it.

(Apparently there needs to be some non-list text between a list and a code block. Here it is; eat it up, markdown!)

union doublebits {
  double d;
  struct {
    unsigned long long mant : 52;
    unsigned int expo : 11;
    unsigned int sign : 1;
  };
};
查看更多
登录 后发表回答