I'm looking at the entry Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup from Bit Twiddling hacks.
I can easily see how the second algorithm in that entry works
static const int MultiplyDeBruijnBitPosition2[32] =
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27];
which calculates n = log2 v
where v
is known to be a power of 2. In this case 0x077CB531
is an ordinary De Bruijn sequence, and the rest is obvious.
However, the first algorithm in that entry
static const int MultiplyDeBruijnBitPosition[32] =
{
0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};
v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;
r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];
looks a bit more tricky to me. We begin by snapping v
to a nearest greater 2^n - 1
value. This 2^n - 1
value is then multiplied by 0x07C4ACDD
, which in this case acts the same way as the DeBruijn sequence in the previous algorithm did.
My question is: how do we construct this magic 0x07C4ACDD
sequence? I.e. how do we construct a sequence that can be used to generate unique indices when multiplied by a 2^n - 1
value? For 2^n
multiplier it is just an ordinary De Bruijn sequence, as we can see above, so it is clear where 0x077CB531
came from. But what about 2^n - 1
multiplier 0x07C4ACDD
? I feel like I'm missing something obvious here.
P.S. To clarify my question: I'm not really looking for an algorithm to generate these sequences. I'm more interested in some more-or-less trivial property (if one exists) that makes 0x07C4ACDD
work as we want it to work. For 0x077CB531
the property that makes it work is pretty obvious: it contains all 5-bit combinations "stored" in the sequence with 1-bit stepping (which is basically what De Bruijn sequence is).
The 0x07C4ACDD
, on the other hand, is not a De Bruijn sequence by itself. So, what property were they aiming for when constructing 0x07C4ACDD
(besides the non-constructive "it should make the above algorithm work")? Someone did come up with the above algorithm somehow. So they probably knew that the approach is viable, and that the appropriate sequence exists. How did they know that?
For example, if I were to construct the algorithm for an arbitrary v
, I'd do
v |= v >> 1;
v |= v >> 2;
...
first. Then I'd just do ++v
to turn v
into a power of 2 (let's assume it doesn't overflow). Then I'd apply the first algorithm. And finally I'd do --r
to obtain the final answer. However, these people managed to optimize it: they eliminated the leading ++v
and the trailing --r
steps simply by changing the multiplier and rearranging the table. How did they know it was possible? What is the math behind this optimization?
A De Bruijn sequence of order n over k symbols (and of k^n length) have a property that every possible n-length word appears as consecutive characters in it, some of them with cyclic wrapping. For example, in the case of k=2, n=2, the possible words are 00, 01, 10, 11, and a De Bruijn sequence is 0011. 00, 01, 11 appears in it, 10 with wrapping. This property naturally means that left shifting a De Bruijn sequence (multiplying with power of two) and taking its upper n bits results in a unique number for each power of two multiplier. Then you only need a lookup table to determine which one it is. It works on a similar principle to numbers which are one less than power of two, but the magic number in this case is not a De Bruijn sequence, but an analogue. The defining property simply changes to "every possible n-length word appears as the sum of the first m subsequences of length n, mod 2^n". This property is all that is needed for the algorithm to work. They simply used this different class of magic numbers to speed up the algorithm. I did as well.
One possible method of construction of De Bruijn numbers is the generation of a Hamiltonian path of the De Bruijn graph, Wikipedia provides an example of such a graph. In this case, the nodes are 2^5=32-bit integers, the directed edges are transitions between them, where a transition is a left shift and a binary or operation according to the label of the edge, 0 or 1. There might be a direct analogue to 2^n-1 type magic numbers, it might be worth exploring, but this isn't a way people usually construct such algorithms.
In practice you might try to construct it differently, especially if you want it to behave in a tad different manner. For example, the implementation of leading/trailing number of zeros algorithms on the bit twiddling hacks page can only return values in [0..31]. It needs additional checking for the case of 0, which has 32 zeros. This check requires a branching and can be way too slow on some CPUs.
The way I did it, I used a 64 element lookup table instead of 32, generated random magic numbers, and for each of them I built up a lookup table with power of two inputs, checked its correctness (injectivity), then verified it for all 32-bit numbers. I continued till I encountered a correct magic number. The resulting numbers do not fulfill a property of "every possible n-length word appears", since only 33 numbers appear, which are unique for all 33 possible input.
Exhaustive brute force search sounds slow, especially if good magic numbers are rare, but if we first test known power of two values as inputs, the table is filled quickly, rejection is fast, and the rejection rate is very high. We only need to clear the table after each magic number. In essence I exploited a high rejection rate algorithm to construct magic numbers.
The resulting algorithms are
As for your question of how did they know, they probably didn't. They experimented, tried to change things, just like me. After all, it isn't a big stretch of imagination that 2^n-1 inputs might work instead of 2^n inputs with different magic number and lookup table.
Here, I made a simplified version of my magic number generator code. It checks all possible magic numbers in 5 minutes if we check only for power of two inputs, finding 1024 magic numbers. Checking against other inputs are pointless since they are reduced to 2^n-1 form anyway. Does not construct the table, but it is trivial once you know the magic number.
It's based on the paper Using de Bruijn Sequences to Index a 1 in a Computer Word. I'm going to guess that they did a search for a perfect hash function to map
2^n-1
to[0..31]
. They describe a method for searching for counting zeroes of integers with up to two bits set which involves incrementally building up the multiplier.From: http://www.stmintz.com/ccc/index.php?id=306404
It seems to me that 0x07C4ACDD is a 5-bit de Bruijn sequence.