_mm_shuffle_ps() equivalent for integer vectors (_

2019-02-16 19:16发布

问题:

The _mm_shuffle_ps() intrinsic allows one to interleave float inputs into low 2 floats and high 2 floats of the output.

For example:

R = _mm_shuffle_ps(L1, H1, _MM_SHUFFLE(3,2,3,2))

will result in:

R[0] = L1[2];
R[1] = L1[3];
R[2] = H1[2];
R[3] = H1[3]

I wanted to know if there was a similar intrinsic available for the integer data type? Something that took two __m128i variables and a mask for interleaving?

The _mm_shuffle_epi32() intrinsic, takes just one 128-bit vector instead of two.

回答1:

Nope, there is no integer equivalent to this. So you have to either emulate it, or cheat.

One method is to use _mm_shuffle_epi32() on A and B. Then mask out the desired terms and OR them back together.

That tends to be messy and has around 5 instructions. (Or 3 if you use the SSE4.1 blend instructions.)

Here's the SSE4.1 solution with 3 instructions:

__m128i A = _mm_set_epi32(13,12,11,10);
__m128i B = _mm_set_epi32(23,22,21,20);

A = _mm_shuffle_epi32(A,2*1 + 3*4 + 2*16 + 3*64);
B = _mm_shuffle_epi32(B,2*1 + 3*4 + 2*16 + 3*64);

__m128i C = _mm_blend_epi16(A,B,0xf0);

The method that I prefer is to actually cheat - and floating-point shuffle like this:

__m128i Ai,Bi,Ci;
__m128  Af,Bf,Cf;

Af = _mm_castsi128_ps(Ai);
Bf = _mm_castsi128_ps(Bi);
Cf = _mm_shuffle_ps(Af,Bf,_MM_SHUFFLE(3,2,3,2));
Ci = _mm_castps_si128(Cf);

What this does is to convert the datatype to floating-point so that it can use the float-shuffle. Then convert it back.

Note that these "conversions" are bitwise conversions (aka reinterpretations). No conversion is actually done and they don't map to any instructions. In the assembly, there is no distinction between an integer or a floating-point SSE register. These cast intrinsics are just to get around the type-safety imposed by C/C++.

However, be aware that this approach incurs extra latency for moving data back-and-forth between the integer and floating-point SIMD execution units. So it will be more expensive than just the shuffle instruction.



标签: c sse