OK, it may sound a bit complicated, but this is what I'm trying to do :
- Take e.g.
10101010101
- And return
{ 0, 2, 4, 6, 8, 10 }
- an array with all of the positions of bits which are set
This is my code :
UINT DQBitboard::firstBit(U64 bitboard)
{
static const int index64[64] = {
63, 0, 58, 1, 59, 47, 53, 2,
60, 39, 48, 27, 54, 33, 42, 3,
61, 51, 37, 40, 49, 18, 28, 20,
55, 30, 34, 11, 43, 14, 22, 4,
62, 57, 46, 52, 38, 26, 32, 41,
50, 36, 17, 19, 29, 10, 13, 21,
56, 45, 25, 31, 35, 16, 9, 12,
44, 24, 15, 8, 23, 7, 6, 5 };
static const U64 debruijn64 = 0x07EDD5E59A4E28C2ULL;
#pragma warning (disable: 4146)
return index64[((bitboard & -bitboard) * debruijn64) >> 58];
}
vector<UINT> DQBitboard::bits(U64 bitboard)
{
vector<UINT> res;
while (bitboard)
{
UINT first = DQBitboard::firstBit(bitboard);
res.push_back(first);
bitboard &= ~(1ULL<<first);
}
return res;
}
And the code surely does work.
My point is :
- Is there any faster implementation you have in mind?
- Do you notice anything that could be optimized? If so, what?
Hints :
UINT
is a typedef ofunsigned int
U64
is a typedef ofunsigned long long
- Both methods are
static inline
.
Here is another suggestion that can be profiled (can be combined with other suggestions for further optimization). Note, the loop here is
O(number of set bits)
.The fastest I can think of right now would be using a pre-generated
array of all numbers (it doesn't have to be all numbers, you can for example break the numbers in 8-bit or 16-bit parts and then concatenate the returned arrays with some proper additions to account for the actual position of the bits).map