I want to convert 8 bit integer to an array of size 8 with each value containing the bit value of an integer.
For example: I have int8_t x = 8;
I want to convert this to int8_t array_x = {0,0,0,0,1,0,0,0};
This has to be done efficiently, since this calculation is part of signal processing block. Is there a efficient way to do this? I did check the blend the instruction. It didn't suit my requirement when having array elements of size 8-bit. development platform is AMD Ryzen.
"Inverse movemask" for a single byte with
0x00:0x01
formatted results, with SIMD but without BMI2.The first example at the end of this answer shows how to use the BMI2 instruction
pdep
to compute the 8 byte array.Note that on Intel Haswell processors and newer, the
pdep
instruction has a throughput of one instruction per cycle and a latency of 3 cycles, which is fast. On AMD Ryzen this instruction is relatively slow unfortunately: both latency and throughput are 18 cycles. For AMD Ryzen it is better to replace thepdep
instruction with a multiplication and a few bitwise operations, which are quite fast on AMD Ryzen, see the second example at the end of this answer.See also here and here for efficient inverse movemask computations, with a scalar source and a 256 bit AVX2 vector destination.
Instead of working with 8 bits and 8 bytes at the time, it might be more efficient to reorganize your algorithm to work with 4 x 8 bits and 4 x 8 bytes per step. In that case the full AVx2 vector width of 256 bit can be utilized, which might be faster.
Peter Cordes shows that the
pext
instruction can be used for the conversion in the opposite direction: from 8 bytes to 8 bits.Code example with the
pdep
instruction:The output is:
Code example for AMD Ryzen processors: