c++ SSE SIMD framework [closed]

2019-03-08 10:41发布

Does anyone know an open-source C++ x86 SIMD intrinsics library?

Intel supplies exactly what I need in their integrated performance primitives library, but I can't use that because of the copyrights all over the place.

EDIT

I already know the intrinsics provided by the compilers. What I need is a convenient interface to use them.

8条回答
对你真心纯属浪费
2楼-- · 2019-03-08 11:02

There are several libraries that have emerged in recent years to abstract explicit SIMD programming. The most important ones:

The most important thing to look for is to have a usable set of types that correctly abstract the best available SIMD registers and instructions for a given target. And, obviously, full portability to systems without SIMD support.

查看更多
姐就是有狂的资本
3楼-- · 2019-03-08 11:05

Vc is another C++ library that implements vector classes and allows writing vectorized code that is independent from the actual instruction set that is used.

查看更多
Lonely孤独者°
4楼-- · 2019-03-08 11:09

Take a look at libsimdpp header-only C++ SIMD wrapper library.

The library supports several instruction sets via single interface: SSE2, SSE3, SSSE3, SSE4.1, AVX, AVX2, AVX512F, XOP, FMA3/4, NEON, NEONv2, Altivec. All of Clang, GCC, MSVC and ICC are suported.

Any differences between instruction sets are resolved by implementing the missing instructions as a combination of supported ones. As a bonus, it's possible to compile the same code for several instruction sets, link the resulting object files to a single executable and use a convenient dynamic dispatch mechanism to run the implementation most tailored to the current processor.

查看更多
▲ chillily
5楼-- · 2019-03-08 11:09

I wrote a GLSL-style library that will convert to near-perfect quality ASM code.

A very common operation - cross product:

vec4 cross(const vec4 &a, const vec4 &b)
{
    return a.yzxw * b.zxyw - a.zxyw * b.yzxw;
}

would be converted to this assemly code using glsl-sse2:

_Z5crossRK4vec4S1_:
    movaps    (%rsi), %xmm1
    movaps    (%rdx), %xmm2
    pshufd    $201, %xmm1, %xmm5
    pshufd    $210, %xmm2, %xmm0
    pshufd    $210, %xmm1, %xmm4
    pshufd    $201, %xmm2, %xmm3
    mulps     %xmm0, %xmm5
    mulps     %xmm3, %xmm4
    subps     %xmm4, %xmm5
    movaps    %xmm5, (%rdi)
    ret

Please note the library isn't perfect yet, and most likely have unfound bugs as it is still new.

查看更多
Emotional °昔
6楼-- · 2019-03-08 11:18

You might want to look at macstl - although it was originally developed for the Mac (and PowerPC) it now works on Linux and x86 too.

Also, if you're working with images then look at OpenCV - this has SSE-optimised routines for many common image processing tasks and has C and C++ APIs.

查看更多
SAY GOODBYE
7楼-- · 2019-03-08 11:19

Have a look at AMD's SSEPlus project, might be what your after

查看更多
登录 后发表回答