This question: How to de-interleave bits (UnMortonizing?) has a good answer for extracting one of the two halves of a Morton number (just the odd bits), but I need a solution which extracts both parts (the odd bits and the even bits) in as few operations as possible.
For my use I would need to take a 32 bit int and extract two 16 bit ints, where one is the even bits and the other is the odd bits shifted right by 1 bit, e.g.
input, z: 11101101 01010111 11011011 01101110
output, x: 11100001 10110111 // odd bits shifted right by 1
y: 10111111 11011010 // even bits
There seem to be plenty of solutions using shifts and masks with magic numbers for generating Morton numbers (i.e. interleaving bits), e.g. Interleave bits by Binary Magic Numbers, but I haven't yet found anything for doing the reverse (i.e. de-interleaving).
UPDATE
After re-reading the section from Hacker's Delight on perfect shuffles/unshuffles I found some useful examples which I adapted as follows:
// morton1 - extract even bits
uint32_t morton1(uint32_t x)
{
x = x & 0x55555555;
x = (x | (x >> 1)) & 0x33333333;
x = (x | (x >> 2)) & 0x0F0F0F0F;
x = (x | (x >> 4)) & 0x00FF00FF;
x = (x | (x >> 8)) & 0x0000FFFF;
return x;
}
// morton2 - extract odd and even bits
void morton2(uint32_t *x, uint32_t *y, uint32_t z)
{
*x = morton1(z);
*y = morton1(z >> 1);
}
I think this can still be improved on, both in its current scalar form and also by taking advantage of SIMD, so I'm still interested in better solutions (either scalar or SIMD).
I didn't want to be limited to a fixed size integer and making lists of similar commands with hardcoded constants, so I developed a C++11 solution which makes use of template metaprogramming to generate the functions and the constants. The assembly code generated with
-O3
seems as tight as it can get without using BMI:TL;DR source repo and live demo.
Implementation
Basically every step in the
morton1
function works by shifting and adding to a sequence of constants which look like this:0b0101010101010101
(alternate 1 and 0)0b0011001100110011
(alternate 2x 1 and 0)0b0000111100001111
(alternate 4x 1 and 0)0b0000000011111111
(alternate 8x 1 and 0)If we were to use
D
dimensions, we would have a pattern withD-1
zeros and1
one. So to generate these it's enough to generate consecutive ones and apply some bitwise or:Now that we can generate the constants at compile time for arbitrary dimensions with the following:
With the same type of recursion, we can generate functions for each of the steps of the algorithm
x = (x | (x >> K)) & M
:It remains to answer the question "how many steps do we need?". This depends also on the number of dimensions. In general,
k
steps compute2^k - 1
output bits; the maximum number of meaningful bits for each dimension is given byz = sizeof(T) * 8 / dimensions
, therefore it is enough to take1 + log_2 z
steps. The problem is now that we need this asconstexpr
in order to use it as a template parameter. The best way I found to work around this is to definelog2
via metaprogramming:And finally, we can perform one single call:
Code for the Intel Haswell and later CPUs. You can use the BMI2 instruction set which contains the pext and pdep instructions. These can (among other great things) be used to build your functions.
If you need speed than you can use table-lookup for one byte conversion at once (two bytes table is faster but to big). Procedure is made under Delphi IDE but the assembler/algorithem is the same.
In case someone is using morton codes in 3d, so he needs to read one bit every 3, and 64 bits here is the function I used:
If your processor handles 64 bit ints efficiently, you could combine the operations...