The Haswell architectures comes up with several new instructions. One of them is PEXT
(parallel bits extract) whose functionality is explained by this image (source here):
It takes a value r2
and a mask r3
and puts the extracted bits of r2
into r1
.
My question is the following: what would be the equivalent code of an optimized templated function in pure standard C++11, that would be likely to be optimized to this instruction by compilers in the future.
Here is some code from Matthew Fioravante's stdcxx-bitops GitHub repo that was floated to the
std-proposals
mailinglist as a preliminary proposal to add aconstexpr
bitwise operations library for C++.