The intrinsic _mm_slli_si128
will do a logical shift left of a 128 bit register, but is restricted to immediate shift values, and shifts by bytes not bits.
I can use an intrinsic like _mm_sll_epi64
or _mm_sll_epi32
to shift left a set of values within the __m128i
register, but these don't carry the "overflow" bits.
For a shift by N bits imagine that I could do a something like:
_mm_sll_epi64
_mm_srr_epi64
(for the bits I want to carry: move them into the low order )- shuffle the srr result
- or these together.
(but probably also have to include checks of N relative to 64).
Is there a better way?
Not your ideal solution, but if you want to rotate or shift an SSE register by a number of bits that is a multiple of 8, then the
PSHUFB
instruction (and the_mm_shuffle_epi8()
intrinsic) can help. It takes a second SSE register as an input; each byte in the register holds a value that is used to index the bytes in the first input register.This came up as a side issue in a blog post (of mine) on unusual C preprocessor uses. For the 127 different shift offsets, there are four different optimal sequences of SSE2 instructions for a bit shift. The preprocessor makes it reasonable to construct a shift function that amounts to a 129-way switch statement. Pardon the raw-code here; I'm unfamiliar with posting code directly here. Check the blog post for an explanation of what's going on.
xm_shr amounts to the above but swapping "shl" and "shr" everywhere in the F[1256] macros. HTH.