How to encode/decode morton codes(z-order) given [x, y] as 32bit unsigned integers producing 64bit morton code, and vice verse ? I do have xy2d and d2xy but only for coordinates that are 16bits wide producing 32bit morton number. Searched a lot in net, but couldn't find. Please help.
相关问题
- Multiple sockets for clients to connect to
- What is the best way to do a search in a large fil
- glDrawElements only draws half a quad
- Index of single bit in long integer (in C) [duplic
- Equivalent of std::pair in C
The naïve code would be the same irregardless of the bit count. If you don't need super fast bit twiddling version, this will do
If you need faster bit twiddling, then this one should work. Note that x and y have to be 64bit variables.
If it is possible for you to use architecture specific instructions you'll likely be able to accelerate the operation beyond what is possible using bit-twiddeling hacks:
For example if you write code for the Intel Haswell and later CPUs you can use the BMI2 instruction set which contains the
pext
andpdep
instructions. These can (among other great things) be used to build your functions.Here is a complete example (tested with GCC):
If you have to support earlier CPUs or the ARM platform not all is lost. You may still get at least get help for the xy_to_morton function from instructions specific for cryptography.
A lot of CPUs have support for carry-less multiplication these days. On ARM that'll be
vmul_p8
from the NEON instruction set. On X86 you'll find it asPCLMULQDQ
from the CLMUL instruction set (available since 2010).The trick here is, that a carry-less multiplication of a number with itself will return a bit-pattern that contains the original bits of the argument with zero-bits interleaved. So it is identical to the _pdep_u32(x,0x55555555) shown above. E.g. it turns the following byte:
Into:
Now you can build the xy_to_morton function as (here shown for CLMUL instruction set):
_mm_clmulepi64_si128
generates a 128 bit result of which we only use the lower 64 bits. So you can even improve upon the version above and use a single _mm_clmulepi64_si128 do do the job.That is as good as you can get on mainstream platforms (e.g. modern ARM with NEON and x86). Unfortunately I don't know of any trick to speed up the morton_to_xy function using the cryptography instructions and I tried really hard for several month.