Using SSE instructions

2019-01-31 00:44发布

I have a loop written in C++ which is executed for each element of a big integer array. Inside the loop, I mask some bits of the integer and then find the min and max values. I heard that if I use SSE instructions for these operations it will run much faster compared to a normal loop written using bitwise AND , and if-else conditions. My question is should I go for these SSE instructions? Also, what happens if my code runs on a different processor? Will it still work or these instructions are processor specific?

15条回答
家丑人穷心不美
2楼-- · 2019-01-31 01:48
  1. SSE instructions are processor specific. You can look up which processor supports which SSE version on wikipedia.
  2. If SSE code will be faster or not depends on many factors: The first is of course whether the problem is memory-bound or CPU-bound. If the memory bus is the bottleneck SSE will not help much. Try simplifying your integer calculations, if that makes the code faster, it's probably CPU-bound, and you have a good chance of speeding it up.
  3. Be aware that writing SIMD-code is a lot harder than writing C++-code, and that the resulting code is much harder to change. Always keep the C++ code up to date, you'll want it as a comment and to check the correctness of your assembler code.
  4. Think about using a library like the IPP, that implements common low-level SIMD operations optimized for various processors.
查看更多
爱情/是我丢掉的垃圾
3楼-- · 2019-01-31 01:48

I agree with the previous posters. Benefits can be quite large but to get it may require a lot of work. Intel documentation on these instructions is over 4K pages. You may want to check out EasySSE (c++ wrappers library over intrinsics + examples) free from Ocali Inc.

I assume my affiliation with this EasySSE is clear.

查看更多
你好瞎i
4楼-- · 2019-01-31 01:49

If you use SSE instructions, you're obviously limited to processors that support these. That means x86, dating back to the Pentium 2 or so (can't remember exactly when they were introduced, but it's a long time ago)

SSE2, which, as far as I can recall, is the one that offers integer operations, is somewhat more recent (Pentium 3? Although the first AMD Athlon processors didn't support them)

In any case, you have two options for using these instructions. Either write the entire block of code in assembly (probably a bad idea. That makes it virtually impossible for the compiler to optimize your code, and it's very hard for a human to write efficient assembler).

Alternatively, use the intrinsics available with your compiler (if memory serves, they're usually defined in xmmintrin.h)

But again, the performance may not improve. SSE code poses additional requirements of the data it processes. Mainly, the one to keep in mind is that data must be aligned on 128-bit boundaries. There should also be few or no dependencies between the values loaded into the same register (a 128 bit SSE register can hold 4 ints. Adding the first and the second one together is not optimal. But adding all four ints to the corresponding 4 ints in another register will be fast)

It may be tempting to use a library that wraps all the low-level SSE fiddling, but that might also ruin any potential performance benefit.

I don't know how good SSE's integer operation support is, so that may also be a factor that can limit performance. SSE is mainly targeted at speeding up floating point operations.

查看更多
登录 后发表回答