Using SSE instructions-第2页回答

I have a loop written in C++ which is executed for each element of a big integer array. Inside the loop, I mask some bits of the integer and then find the min and max values. I heard that if I use SSE instructions for these operations it will run much faster compared to a normal loop written using bitwise AND , and if-else conditions. My question is should I go for these SSE instructions? Also, what happens if my code runs on a different processor? Will it still work or these instructions are processor specific?

标签： c++ optimization assembly processor sse

15条回答

何必那么认真

2楼-- · 2019-01-31 01:39

Have a look at inline assembler for C/C++, here is a DDJ article. Unless you are 100% certain your program will run on a compatible platform you should follow the recommendations many have given here.

0人赞添加讨论(0) 举报

唯我独甜

3楼-- · 2019-01-31 01:41

SIMD intrinsics (such as SSE2) can speed this sort of thing up but take expertise to use correctly. They are very sensitive to alignment and pipeline latency; careless use can make performance even worse than it would have been without them. You'll get a much easier and more immediate speedup from simply using cache prefetching to make sure all your ints are in L1 in time for you to operate on them.

Unless your function needs a throughput of better than 100,000,000 integers per second, SIMD probably isn't worth the trouble for you.

0人赞添加讨论(0) 举报

Deceive 欺骗

4楼-- · 2019-01-31 01:43

Although it is true that SSE is specific to some processors (SSE may be relatively safe, SSE2 much less in my experience), you can detect the CPU at runtime, and load the code dynamically depending on the target CPU.

0人赞添加讨论(0) 举报

女痞

5楼-- · 2019-01-31 01:43

Just to add briefly to what has been said before about different SSE versions being available on different CPUs: This can be checked by looking at the respective feature flags returned by the CPUID instruction (see e.g. Intel's documentation for details).

0人赞添加讨论(0) 举报

冷血范

6楼-- · 2019-01-31 01:45

Write code that helps the compiler understand what you are doing. GCC will understand and optimize SSE code such as this:

typedef union Vector4f
{
        // Easy constructor, defaulted to black/0 vector
    Vector4f(float a = 0, float b = 0, float c = 0, float d = 1.0f):
        X(a), Y(b), Z(c), W(d) { }

        // Cast operator, for []
    inline operator float* ()
    { 
        return (float*)this;
    }

        // Const ast operator, for const []
    inline operator const float* () const
    { 
        return (const float*)this;
    }

    // ---------------------------------------- //

    inline Vector4f operator += (const Vector4f &v)
    {
        for(int i=0; i<4; ++i)
            (*this)[i] += v[i];

        return *this;
    }

    inline Vector4f operator += (float t)
    {
        for(int i=0; i<4; ++i)
            (*this)[i] += t;

        return *this;
    }

        // Vertex / Vector 
        // Lower case xyzw components
    struct {
        float x, y, z;
        float w;
    };

        // Upper case XYZW components
    struct {
        float X, Y, Z;
        float W;
    };
};

Just don't forget to have -msse -msse2 on your build parameters!

0人赞添加讨论(0) 举报

走好不送

7楼-- · 2019-01-31 01:47

I don't recommend doing this yourself unless you're fairly proficient with assembly. Using SSE will, more than likely, require careful reorganization of your data, as Skizz points out, and the benefit is often questionable at best.

It would probably be much better for you to write very small loops and keep your data very tightly organized and just rely on the compiler doing this for you. Both the Intel C Compiler and GCC (since 4.1) can auto-vectorize your code, and will probably do a better job than you. (Just add -ftree-vectorize to your CXXFLAGS.)

Edit: Another thing I should mention is that several compilers support assembly intrinsics, which would probably, IMO, be easier to use than the asm() or __asm{} syntax.

0人赞添加讨论(0) 举报

Using SSE instructions

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间