I'd like to optimize the following snippet using SSE instructions if possible:
/*
* the data structure
*/
typedef struct v3d v3d;
struct v3d {
double x;
double y;
double z;
} tmp = { 1.0, 2.0, 3.0 };
/*
* the part that should be "optimized"
*/
tmp.x /= 4.0;
tmp.y /= 4.0;
tmp.z /= 4.0;
Is this possible at all?
Is
tmp.x *= 0.25;
enough?Note that for SSE instructions (in case that you want to use them) it's important that:
1) all the memory access is 16 bytes alighed
2) the operations are performed in a loop
3) no int <-> float or float <-> double conversions are performed
4) avoid divisions if possible
I've used SIMD extension under windows, but have not yet under linux. That being said you should be able to take advantage of the
DIVPS
SSE operation which will divide a 4 float vector by another 4 float vector. But you are using doubles, so you'll want the SSE2 versionDIVPD
. I almost forgot, make sure to build with-msse2
switch.I found a page which details some SSE GCC builtins. It looks kind of old, but should be a good start.
http://ds9a.nl/gcc-simd/
The intrinsic you are looking for is
_mm_div_pd
. Here is a working example which should be enough to steer you in the right direction: