I've been trying my hand at optimising some code I have using microsoft's sse intrinsics. One of the biggest problems when optimising my code is the LHS that happens whenever I want to use a constant. There seems to be some info on generating certain constants (here and here - section 13.4), but its all assembly (which I would rather avoid).
The problem is when I try to implement the same thing with intrinsics, msvc complains about incompatible types etc. Does anyone know of any equivalent tricks using intrinsics?
Example - Generate {1.0,1.0,1.0,1.0}
//pcmpeqw xmm0,xmm0
__m128 t = _mm_cmpeq_epi16( t, t );
//pslld xmm0,25
_mm_slli_epi32(t, 25);
//psrld xmm0,2
return _mm_srli_epi32(t, 2);
This generates a bunch of errors about incompatible type (__m128 vs _m128i). I'm pretty new to this, so I'm pretty sure I'm missing something obvious. Can anyone help?
tldr - How do I generate an __m128 vec filled with single precision constant floats with ms intrinsics?
Thanks for reading :)