I'm using AVX2 instructions in some C code.
The VPERMD instruction takes two 8-integer vectors a
and idx
and generates a third one, dst
, by permuting a
based on idx
. This seems equivalent to dst[i] = a[idx[i]] for i in 0..7
. I'm calling this source based, because the move is indexed based on the source.
However, I have my calculated indices in destination based form. This is natural for setting an array, and is equivalent to dst[idx[i]] = a[i] for i in 0..7
.
How can I convert from source-based form to destination-based form? An example test case is:
{2 1 0 5 3 4 6 7} source-based form.
{2 1 0 4 5 3 6 7} destination-based equivalent
For this conversion, I'm staying in ymm registers, so that means that destination-based solutions don't work. Even if I were to insert each separately, since it only operates on constant indexes, you can't just set them.