How to perform a bitwise operation on floating poi

2019-01-01 09:13发布

问题:

I tried this:

float a = 1.4123;
a = a & (1 << 3);

I get a compiler error saying that the operand of & cannot be of type float.

When I do:

float a = 1.4123;
a = (int)a & (1 << 3);

I get the program running. The only thing is that the bitwise operation is done on the integer representation of the number obtained after rounding off.

The following is also not allowed.

float a = 1.4123;
a = (void*)a & (1 << 3);

I don\'t understand why int can be cast to void* but not float.

I am doing this to solve the problem described in Stack Overflow question How to solve linear equations using a genetic algorithm?.

回答1:

At the language level, there\'s no such thing as \"bitwise operation on floating-point numbers\". Bitwise operations in C/C++ work on value-representation of a number. And the value-representation of floating point numbers is not defined in C/C++. Floating point numbers don\'t have bits at the level of value-representation, which is why you can\'t apply bitwise operations to them.

All you can do is analyze the bit content of the raw memory occupied by the floating-point number. For that you need to either use a union as suggested below or (equivalently, and only in C++) reinterpret the floating-point object as an array of unsigned char objects, as in

float f = 5;
unsigned char *c = reinterpret_cast<unsigned char *>(&f);
// inspect memory from c[0] to c[sizeof f - 1]

And please, don\'t try to reinterpret a float object as an int object, as other answers suggest. That doesn\'t make much sense, that is illegal, and that is not guaranteed to work in compilers that follow strict-aliasing rules in optimization. The only legal way to inspect memory content in C++ is by reinterpreting it as an array of [signed/unsigned] char.

Also note that you technically aren\'t guaranteed that floating-point representation on your system is IEEE754 (although in practice it is unless you explicitly allow it not to be, and then only with respect to -0.0, ±infinity and NaN).



回答2:

If you are trying to change the bits in the floating-point representation, you could do something like this:

union fp_bit_twiddler {
    float f;
    int i;
} q;
q.f = a;
q.i &= (1 << 3);
a = q.f;

As AndreyT notes, accessing a union like this invokes undefined behavior, and the compiler could grow arms and strangle you. Do what he suggests instead.



回答3:

float a = 1.4123;
unsigned int* inta = reinterpret_cast<unsigned int*>(&a);
*inta = *inta & (1 << 3);


回答4:

Have a look at the following. Inspired by fast inverse square root:

#include <iostream>
using namespace std;

int main()
{
    float x, td = 2.0;
    int ti = *(int*) &td;
    cout << \"Cast int: \" << ti << endl;
    ti = ti>>4;
    x = *(float*) &ti;
    cout << \"Recast float: \" << x << endl;
    return 0; 
}


回答5:

@mobrule:

Better:

#include <stdint.h>
...
union fp_bit_twiddler {
    float f;
    uint32_t u;
} q;

/* mutatis mutandis ... */

For these values int will likely be ok, but generally, you should use unsigned ints for bit shifting to avoid the effects of arithmetic shifts. And the uint32_t will work even on systems whose ints are not 32 bits.



回答6:

The Python implementation in Floating point bitwise operations (Python recipe) of floating point bitwise operations works by representing numbers in binary that extends infinitely to the left as well as to the right from the fractional point. Because floating point numbers have a signed zero on most architectures it uses ones\' complement for representing negative numbers (well, actually it just pretends to do so and uses a few tricks to achieve the appearance).

I\'m sure it can be adapted to work in C++, but care must be taken so as to not let the right shifts overflow when equalizing the exponents.



回答7:

Bitwise operators should NOT be used on floats, as floats are hardware specific, regardless of similarity on what ever hardware you might have. Which project/job do you want to risk on \"well it worked on my machine\"? Instead, for C++, you can get a similar \"feel\" for the bit shift operators by overloading the stream operator on an \"object\" wrapper for a float:

// Simple object wrapper for float type as templates want classes.
class Float
{
float m_f;
public:
    Float( const float & f )
    : m_f( f )
    {
    }

    operator float() const
    {
        return m_f;
    }
};

float operator>>( const Float & left, int right )
{
    float temp = left;
    for( right; right > 0; --right )
    {
        temp /= 2.0f;
    }
    return temp;
}

float operator<<( const Float & left, int right )
{
    float temp = left;
    for( right; right > 0; --right )
    {
        temp *= 2.0f;
    }
    return temp;
}

int main( int argc, char ** argv )
{
    int a1 = 40 >> 2; 
    int a2 = 40 << 2;
    int a3 = 13 >> 2;
    int a4 = 256 >> 2;
    int a5 = 255 >> 2;

    float f1 = Float( 40.0f ) >> 2; 
    float f2 = Float( 40.0f ) << 2;
    float f3 = Float( 13.0f ) >> 2;
    float f4 = Float( 256.0f ) >> 2;
    float f5 = Float( 255.0f ) >> 2;
}

You will have a remainder, which you can throw away based on your desired implementation.



回答8:

float a = 1.4123;
int *b = (int *)&a;
*b = *b & (1 << 3);
// a is now the IEEE floating-point value caused by the manipulation of *b
// equals 1.121039e-44 (tested on my system)

This is similar to Justin\'s response, except that it only creates a view of the bits in the same registers as a. So when you manipulate *b, a\'s value changes accordingly.



回答9:

FWIW, there is a real use case for bit-wise operations on floating point (I just ran into it recently) - shaders written for GPUs that only support older versions of GLSL (1.2 and earlier did not have support for bit-wise operators), and where there would be loss of precision if the floats were converted to ints.

The bit-wise operations can be implemented on floating point numbers using remainders (modulo) and inequality checks. For example:

float A = 0.625; //value to check; ie, 160/256
float mask = 0.25; //bit to check; ie, 1/4
bool result = (mod(A, 2.0 * mask) >= mask); //non-zero if bit 0.25 is on in A

The above assumes that A is between [0..1) and that there is only one \"bit\" in mask to check, but it could be generalized for more complex cases.

This idea is based on some of the info found in is-it-possible-to-implement-bitwise-operators-using-integer-arithmetic

If there is not even a built-in mod function, then that can also be implemented fairly easily. For example:

float mod(float num, float den)
{
    return num - den * floor(num / den);
}