I've got some code, originally given to me by someone working with MSVC, and I'm trying to get it to work on Clang. Here's the function that I'm having trouble with:
float vectorGetByIndex( __m128 V, unsigned int i )
{
assert( i <= 3 );
return V.m128_f32[i];
}
The error I get is as follows:
Member reference has base type '__m128' is not a structure or union.
I've looked around and found that Clang (and maybe GCC) has a problem with treating __m128 as a struct or union. However I haven't managed to find a straight answer as to how I can get these values back. I've tried using the subscript operator and couldn't do that, and I've glanced around the huge list of SSE intrinsics functions and haven't yet found an appropriate one.
A union is probably the most portable way to do this:
union {
__m128 v; // SSE 4 x float vector
float a[4]; // scalar array of 4 floats
} U;
float vectorGetByIndex(__m128 V, unsigned int i)
{
U u;
assert(i <= 3);
u.v = V;
return u.a[i];
}
Even if SSE4.1 is available and i
is a compile time constant, you can't use pextract
etc. this way:
// broken code starts here
template<unsigned i>
float vectorGetByIndex( __m128 V) {
return _mm_extract_epi32(V, i);
}
// broken code ends here
I don't delete it because it is a useful reminder how to not do things and let it stand as a public humiliation.
Better use
template<unsigned i>
float vectorGetByIndex( __m128 V) {
union {
__m128 v;
float a[4];
} converter;
converter.v = V;
return converter.a[i];
}
which will work regardless of the available instruction set.
As a modification to hirschhornsalz's solution, if i
is a compile-time constant, you could avoid the union path entirely by using a shuffle/store:
template<unsigned i>
float vectorGetByIndex( __m128 V)
{
#ifdef __SSE4_1__
return _mm_extract_epi32(V, i);
#else
float ret;
// shuffle V so that the element that you want is moved to the least-
// significant element of the vector (V[0])
V = _mm_shuffle_ps(V, V, _MM_SHUFFLE(i, i, i, i));
// return the value in V[0]
return _mm_cvtss_f32(V);
#endif
}
The way I use is
union vec { __m128 sse, float f[4] };
float accessmember(__m128 v, int index)
{
vec v.sse = v;
return v.f[index];
}
Seems to work out pretty well for me.