What is the best algorithm to achieve the following:
0010 0000 => 0000 0100
The conversion is from MSB->LSB to LSB->MSB. All bits must be reversed; that is, this is not endianness-swapping.
What is the best algorithm to achieve the following:
0010 0000 => 0000 0100
The conversion is from MSB->LSB to LSB->MSB. All bits must be reversed; that is, this is not endianness-swapping.
Well, this is basically the same as the first "reverse()" but it is 64 bit and only needs one immediate mask to be loaded from the instruction stream. GCC creates code without jumps, so this should be pretty fast.
Generic
C code. Using 1 byte input data num as example.
Well this certainly won't be an answer like Matt J's but hopefully it will still be useful.
This is exactly the same idea as Matt's best algorithm except that there's this little instruction called BSWAP which swaps the bytes (not the bits) of a 64-bit number. So b7,b6,b5,b4,b3,b2,b1,b0 becomes b0,b1,b2,b3,b4,b5,b6,b7. Since we are working with a 32-bit number we need to shift our byte-swapped number down 32 bits. This just leaves us with the task of swapping the 8 bits of each byte which is done and voila! we're done.
Timing: on my machine, Matt's algorithm ran in ~0.52 seconds per trial. Mine ran in about 0.42 seconds per trial. 20% faster is not bad I think.
If you're worried about the availability of the instruction BSWAP Wikipedia lists the instruction BSWAP as being added with 80846 which came out in 1989. It should be noted that Wikipedia also states that this instruction only works on 32 bit registers which is clearly not the case on my machine, it very much works only on 64-bit registers.
This method will work equally well for any integral datatype so the method can be generalized trivially by passing the number of bytes desired:
which can then be called like:
The compiler should be able to optimize the extra parameter away (assuming the compiler inlines the function) and for the
sizeof(size_t)
case the right-shift would be removed completely. Note that GCC at least is not able to remove the BSWAP and right-shift if passedsizeof(char)
.I was curious how fast would be the obvious raw rotation. On my machine (i7@2600), the average for 1,500,150,000 iterations was
27.28 ns
(over a a random set of 131,071 64-bit integers).Advantages: the amount of memory needed is little and the code is simple. I would say it is not that large, either. The time required is predictable and constant for any input (128 arithmetic SHIFT operations + 64 logical AND operations + 64 logical OR operations).
I compared to the best time obtained by @Matt J - who has the accepted answer. If I read his answer correctly, the best he has got was
0.631739
seconds for1,000,000
iterations, which leads to an average of631 ns
per rotation.The code snippet I used is this one below:
This is another solution for folks who love recursion.
The idea is simple. Divide up input by half and swap the two halves, continue until it reaches single bit.
Here is a recursive function to solve it. (Note I have used unsigned ints, so it can work for inputs up to sizeof(unsigned int)*8 bits.
This is the output: