SIMD (SSE) instruction for division in GCC

2020-07-13 01:36发布

I'd like to optimize the following snippet using SSE instructions if possible:

/*
 * the data structure
 */
typedef struct v3d v3d;
struct v3d {
    double x;
    double y;
    double z;
} tmp = { 1.0, 2.0, 3.0 };

/*
 * the part that should be "optimized"
 */
tmp.x /= 4.0;
tmp.y /= 4.0;
tmp.z /= 4.0;

Is this possible at all?

3条回答
家丑人穷心不美
2楼-- · 2020-07-13 01:44

Is tmp.x *= 0.25; enough?

Note that for SSE instructions (in case that you want to use them) it's important that:

1) all the memory access is 16 bytes alighed

2) the operations are performed in a loop

3) no int <-> float or float <-> double conversions are performed

4) avoid divisions if possible

查看更多
我欲成王,谁敢阻挡
3楼-- · 2020-07-13 01:52

I've used SIMD extension under windows, but have not yet under linux. That being said you should be able to take advantage of the DIVPS SSE operation which will divide a 4 float vector by another 4 float vector. But you are using doubles, so you'll want the SSE2 version DIVPD. I almost forgot, make sure to build with -msse2 switch.

I found a page which details some SSE GCC builtins. It looks kind of old, but should be a good start.

http://ds9a.nl/gcc-simd/

查看更多
混吃等死
4楼-- · 2020-07-13 02:01

The intrinsic you are looking for is _mm_div_pd. Here is a working example which should be enough to steer you in the right direction:

#include <stdio.h>

#include <emmintrin.h>

typedef struct
{
    double x;
    double y;
    double z;
} v3d;

typedef union __attribute__ ((aligned(16)))
{
    v3d a;
    __m128d v[2];
} u3d;

int main(void)
{
    const __m128d vd = _mm_set1_pd(4.0);
    u3d u = { { 1.0, 2.0, 3.0 } };

    printf("v (before) = { %g %g %g }\n", u.a.x, u.a.y, u.a.z);

    u.v[0] = _mm_div_pd(u.v[0], vd);
    u.v[1] = _mm_div_pd(u.v[1], vd);

    printf("v (after) = { %g %g %g }\n", u.a.x, u.a.y, u.a.z);

    return 0;
}
查看更多
登录 后发表回答