Casting float to int (bitwise) in C

2019-01-06 21:10发布

Given the 32 bits that represent an IEEE 754 floating-point number, how can the number be converted to an integer, using integer or bit operations on the representation (rather than using a machine instruction or compiler operation to convert)?

I have the following function but it fails in some cases:

Input: int x (contains 32 bit single precision number in IEEE 754 format)

  if(x == 0) return x;

  unsigned int signBit = 0;
  unsigned int absX = (unsigned int)x;
  if (x < 0)
  {
      signBit = 0x80000000u;
      absX = (unsigned int)-x;
  }

  unsigned int exponent = 158;
  while ((absX & 0x80000000) == 0)
  {
      exponent--;
      absX <<= 1;
  }

  unsigned int mantissa = absX >> 8;

  unsigned int result = signBit | (exponent << 23) | (mantissa & 0x7fffff);
  printf("\nfor x: %x, result: %x",x,result);
  return result;

7条回答
Deceive 欺骗
2楼-- · 2019-01-06 21:29

C has the "union" to handle this type of view of data:

typedef union {
  int i;
  float f;
 } u;
 u u1;
 u1.f = 45.6789;
 /* now u1.i refers to the int version of the float */
 printf("%d",u1.i);
查看更多
我只想做你的唯一
3楼-- · 2019-01-06 21:29

You cannot (meaningfully) convert a floating point number into an 'integer' (signed int or int) in this way.

It may end up having the integer type, but it's actually just an index into the encoding space of IEEE754, not a meaningful value in itself.

You might argue that an unsigned int serves dual purpose as a bit pattern and an integer value, but int does not.


Also there are platform issues with bit manipulation of signed ints.

查看更多
Deceive 欺骗
4楼-- · 2019-01-06 21:33

You can cast the float using a reference. A cast like this should never generate any code.

C++

float f = 1.0f;
int i = (int &)f;
printf("Float %f is 0x%08x\n", f, i);

Output:

Float 1.000000 is 0x3f800000

If you want c++ style cast use a reinterpret_cast, like this.

int i = reinterpret_cast<int &>(f);

It does not work with expressions, you have to store it in a variable.

    int i_times_two;
    float f_times_two = f * 2.0f;
    i_times_two = (int &)f_times_two;

    i_times_two = (int &)(f * 2.0f);
main.cpp:25:13: error: C-style cast from rvalue to reference type 'int &'
查看更多
5楼-- · 2019-01-06 21:39

&x gives the address of x so has float* type.

(int*)&x cast that pointer to a pointer to int ie to a int* thing.

*(int*)&x dereference that pointer into an int value. It won't do what you believe on machines where int and float have different sizes.

And there could be endianness issues.

This solution was used in the fast inverse square root algorithm.

查看更多
ら.Afraid
6楼-- · 2019-01-06 21:41

(Somebody should double-check this answer, especially border cases and the rounding of negative values. Also, I wrote it for round-to-nearest. To reproduce C’s conversion, this should be changed to round-toward-zero.)

Essentially, the process is:

Separate the 32 bits into one sign bit (s), eight exponent bits (e), and 23 significand bits (f). We will treat these as twos-complement integers.

If e is 255, the floating-point object is either infinity (if f is zero) or a NaN (otherwise). In this case, the conversion cannot be performed, and an error should be reported.

Otherwise, if e is not zero, add 224 to f. (If e is not zero, the significand implicitly has a 1 bit at its front. Adding 224 makes that bit explicit in f.)

Subtract 127 from e. (This converts the exponent from its biased/encoded form to the actual exponent. If we were doing a general conversion to any value, we would have to handle the special case when e is zero: Subtract 126 instead of 127. But, since we are only converting to an integer result, we can neglect this case, as long as the integer result is zero for these tiny input numbers.)

If s is 0 (the sign is positive) and e is 31 or more, then the value overflows a signed 32-bit integer (it is 231 or larger). The conversion cannot be performed, and an error should be reported.

If s is 1 (the sign is negative) and e is more than 31, then the value overflows a signed 32-bit integer (it is less than or equal to -232). If s is one, e is 32, and f is greater than 224 (any of the original significand bits were set), then the value overflows a signed 32-bit integer (it is less than -231; if the original f were zero, it would be exactly -231, which does not overflow). In any of these cases, the conversion cannot be performed, and an error should be reported.

Now we have an s, an e, and an f for a value which does not overflow, so we can prepare the final value.

If s is 1, set f to -f.

The exponent value is for a significand between 1 (inclusive) and 2 (exclusive), but our significand starts with a bit at 224. So we have to adjust for that. If e is 24, our significand is correct, and we are done, so return f as the result. If e is greater than 24 or less than 24, we have to shift the significand appropriately. Also, if we are going to shift f right, we may have to round it, to get a result rounded to the nearest integer.

If e is greater than 24, shift f left e-24 bits. Return f as the result.

If e is less than -1, the floating-point number is between -½ and ½, exclusive. Return 0 as the result.

Otherwise, we will shift f right 24-e bits. However, we will first save the bits we need for rounding. Set r to the result of casting f to an unsigned 32-bit integer and shifting it left by 32-(24-e) bits (equivalently, left by 8+e bits). This takes the bits that will be shifted out of f (below) and “left adjusts” them in the 32 bits, so we have a fixed position where they start.

Shift f right 24-e bits.

If r is less than 231, do nothing (this is rounding down; the shift truncated bits). If r is greater than 231, add one to f (this is rounding up). If r equals 231, add the low bit of f to f. (If f is odd, add one to f. Of the two equally near values, this rounds to the even value.) Return f.

查看更多
Summer. ? 凉城
7楼-- · 2019-01-06 21:41
// With the proviso that your compiler implementation uses
// the same number of bytes for an int as for a float:
// example float
float f = 1.234f;
// get address of float, cast as pointer to int, reference
int i = *((int *)&f);
// get address of int, cast as pointer to float, reference
float g = *((float *)&i);
printf("%f %f %08x\n",f,g,i);
查看更多
登录 后发表回答